snipgenie package

Submodules

snipgenie.gui module

snipgenie GUI. Created Jan 2020 Copyright (C) Damien Farrell

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

class snipgenie.gui.App(filenames=[], project=None)[source]

Bases: QMainWindow

GUI Application using PySide2 widgets

about()[source]
add_dock(widget, name)[source]

Add a dock widget

add_file(filter='Fasta Files(*.fa *.fna *.fasta)', path=None)[source]

Add a file to the config folders

add_gc_mean(progress_callback)[source]

Get mean GC to indicate contamination

add_mapping_stats(progress_callback)[source]

get mapping stats for all files and add to table

add_mask(filename=None)[source]

Add mask bed file

add_mean_depth(progress_callback)[source]

find mean depth for bam file

add_plugin_dock(plugin)[source]

Add plugin as dock widget

add_read_lengths(progress_callback)[source]

Get read lengths

add_recent_file(fname)[source]

Add file to recent if not present

align_files(progress_callback)[source]

Run gene annotation for input files. progress_callback: signal for indicating progress in gui

alignment_completed()[source]

Alignment/calling completed

calling_completed()[source]
check_contamination()[source]

Blast to common contaminant sequences

check_fastq_table()[source]

Update samples file to reflect table

check_files()[source]

Check input files exist

check_heterozygosity()[source]

Plot heterozygosity for each sample

check_missing_files()[source]

Check folders for missing files

check_output_folder()[source]

Check if we have an output dir

clean_up()[source]

Clean up intermediate files

clear_plugins()[source]

remove all open plugins

clear_tabs()[source]

Clear tabbed panes

closeEvent(event=None)[source]

Close main window

close_right_tab(index)[source]

Close right tab

close_tab(index)[source]

Close current tab

create_menu()[source]

Create the menu bar for the application.

create_tool_bar()[source]

Create main toolbar

csq_viewer()[source]

Show CSQ table - output of bcftools csq

discover_plugins()[source]

Discover available plugins

fastq_quality_report()[source]

Make fastq quality report as pdf

get_fasta_reads()[source]

Get a sample of reads for blasting

get_right_tabs()[source]
get_selected()[source]

Get selected rows of fastq table

get_tab_indices(tab_widget, tab_name)[source]
get_tab_names()[source]
get_tabs()[source]
import_results_folder(path)[source]

Import previously made results

load_fastq_files_dialog()[source]

Load fastq files

load_fastq_folder_dialog()[source]

Load fastq folder

load_fastq_table(filenames)[source]

Append/Load fasta inputs into table

load_plugin(plugin)[source]

Instantiate the plugin and show widget or run it

load_preset_genome(seqname, gbfile, mask, ask)[source]
load_presets_menu(ask=True)[source]

Add preset genomes to menu

load_project(filename=None)[source]

Load project

load_project_dialog()[source]

Load project

load_settings()[source]

Load GUI settings

load_test()[source]

Load test_files

make_phylo_tree(progress_callback=None, method='raxml')[source]

Make phylogenetic tree

mapping_stats(row)[source]

Summary of a single fastq file

merge_meta_data()[source]

Add sample meta data by merging with file table

missing_sites(progress_callback=None)[source]

Find missing sites in each sample - useful for quality control

new_project(ask=False)[source]

Clear all loaded inputs and results

online_documentation(event=None)[source]

Open the online documentation

phylogeny_completed()[source]
plot_dist_matrix()[source]
preferences()[source]

Preferences dialog

processing_completed()[source]

Generic process completed

progress_fn(msg)[source]
quality_summary(row)[source]

Summary of a single sample, both files

quit()[source]
rd_analysis(progress_callback)[source]

Run RD analysis for MTBC species

rd_analysis_completed()[source]

RD analysis completed

read_distributon(row)[source]

get read length distribution

redirect_stdout()[source]

redirect stdout

run()[source]

Run all steps

run_threaded_process(process, on_complete)[source]

Execute a function in the background with a worker

run_trimming(progress_callback)[source]

Run quality and adapter trimming

sample_details(row)[source]
save_plugin_data()[source]

Save data for any plugins that need it

save_project()[source]

Save project

save_project_dialog()[source]

Save as project

save_settings()[source]

Save GUI settings

set_annotation(filename=None)[source]
set_mask(filename)[source]
set_output_folder()[source]

Set the output folder

set_reference(filename=None, ask=True)[source]

Reset the reference sequence

set_style(style='default')[source]

Change interface style.

setup_gui()[source]

Add all GUI elements

setup_paths()[source]

Set paths to important files in proj folder

show_bam_viewer(row)[source]

Show simple alignment view for a bam file

show_blast_url()[source]
show_browser_tab(link, name)[source]

Show web page in a tab

show_error_log()[source]

Show log file contents

show_info(msg, color=None)[source]
show_map()[source]
show_nucldb_url()[source]
show_phylogeny()[source]

Show current tree

show_plugin(plugin)[source]

Show plugin in dock or as window

show_recent_files()[source]

Populate recent files menu

show_ref_annotation()[source]

Show annotation in table

show_snpdist()[source]

Show SNP distance matrix

show_variants()[source]

Show the stored results from variant calling as tables

snp_alignment(progress_callback=None)[source]

Make snp matrix from variant positions

snp_typing(progress_callback)[source]

SNP typing for M.bovis

snp_viewer()[source]

Show SNP table - output of core.txt

start_logging()[source]

Error logging

staticMetaObject = <PySide2.QtCore.QMetaObject object>
tree_viewer()[source]

Show tree viewer

update_labels()[source]
update_mask()[source]
update_plugin_menu()[source]

Update plugins

update_ref_genome()[source]

Update the ref genome labels

update_table(new)[source]

Update table with changed rows

variant_calling(progress_callback=None)[source]

Run variant calling for available bam files.

vcf_viewer()[source]

Show VCF table

zoom_in()[source]
zoom_out()[source]
class snipgenie.gui.AppOptions(parent=None)[source]

Bases: BaseOptions

Class to provide a dialog for global plot options

class snipgenie.gui.Communicate[source]

Bases: QObject

newproj
staticMetaObject = <PySide2.QtCore.QMetaObject object>
class snipgenie.gui.StdoutRedirect(*param)[source]

Bases: QObject

printOccur
start()[source]
staticMetaObject = <PySide2.QtCore.QMetaObject object>
stop()[source]
write(s, color='black')[source]
class snipgenie.gui.Worker(fn, *args, **kwargs)[source]

Bases: QRunnable

Worker thread for running background tasks.

run(self) None[source]
class snipgenie.gui.WorkerSignals[source]

Bases: QObject

Defines the signals available from a running worker thread. Supported signals are: finished

No data

error

tuple (exctype, value, traceback.format_exc() )

result

object data returned from processing, anything

error
finished
progress
result
staticMetaObject = <PySide2.QtCore.QMetaObject object>
snipgenie.gui.main()[source]

Run the application

snipgenie.widgets module

Qt widgets for snpgenie. Created Jan 2020 Copyright (C) Damien Farrell

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

class snipgenie.widgets.BaseOptions(parent=None, opts={}, groups={})[source]

Bases: object

Class to generate widget dialog for dict of options

apply()[source]
applyOptions()[source]

Set the plot kwd arguments from the widgets

increment(key, inc)[source]

Increase the value of a widget

setWidgetValue(key, value)[source]

Set a widget value

showDialog(parent, wrap=2, section_wrap=2, style=None)[source]

Auto create tk vars, widgets for corresponding options and and return the frame

updateWidgets(kwds)[source]
class snipgenie.widgets.BasicDialog(parent, table, title=None)[source]

Bases: QDialog

Qdialog for table operations interfaces

apply()[source]

Override this

close(self) bool[source]
copy_to_clipboard()[source]

Copy result to clipboard

createButtons(parent)[source]
createWidgets()[source]

Create widgets - override this

staticMetaObject = <PySide2.QtCore.QMetaObject object>
update()[source]

Update the original table

class snipgenie.widgets.BrowserViewer(parent=None)[source]

Bases: QDialog

Browser widget

add_widgets()[source]

Add widgets

load_page(url)[source]
navigate_to_url()[source]

method called by the line edit when return key is pressed getting url and converting it to QUrl object

staticMetaObject = <PySide2.QtCore.QMetaObject object>
update_urlbar(q)[source]

method for updating url this method is called by the QWebEngineView object

zoom()[source]
class snipgenie.widgets.ColorButton(*args, color=None, **kwargs)[source]

Bases: QPushButton

Custom Qt Widget to show a chosen color.

Left-clicking the button shows the color-chooser, while right-clicking resets the color to None (no-color).

color()[source]
colorChanged
mousePressEvent(self, e: PySide2.QtGui.QMouseEvent) None[source]
onColorPicker()[source]

Show color-picker dialog to select color. Qt will use the native dialog by default.

setColor(color)[source]
staticMetaObject = <PySide2.QtCore.QMetaObject object>
class snipgenie.widgets.DynamicDialog(parent=None, options={}, groups=None, title='Dialog')[source]

Bases: QDialog

Dynamic form using baseoptions

get_values()[source]

Get the widget values

staticMetaObject = <PySide2.QtCore.QMetaObject object>
class snipgenie.widgets.Editor(parent=None, fontsize=12, **kwargs)[source]

Bases: QTextEdit

contextMenuEvent(self, e: PySide2.QtGui.QContextMenuEvent) None[source]
insert(txt)[source]
staticMetaObject = <PySide2.QtCore.QMetaObject object>
zoom(delta)[source]
class snipgenie.widgets.FileViewer(parent=None, filename=None)[source]

Bases: QDialog

Sequence records features viewer

show_records(recs, format='genbank')[source]
staticMetaObject = <PySide2.QtCore.QMetaObject object>
class snipgenie.widgets.GraphicalBamViewer(parent=None, filename=None)[source]

Bases: QDialog

Alignment viewer with pylab

add_widgets()[source]

Add widgets

load_data(bam_file, ref_file, gb_file=None, vcf_file=None)[source]

Load reference seq and get contig/chrom names

redraw(xstart=1, xend=2000)[source]

Plot the features

set_chrom(chrom)[source]

Set the selected record which also updates the plot

staticMetaObject = <PySide2.QtCore.QMetaObject object>
update_chrom(chrom=None)[source]

Update after chromosome selection changed

value_changed()[source]

Callback for widgets

zoom_in()[source]

Zoom in

zoom_out()[source]

Zoom out

class snipgenie.widgets.MergeDialog(parent, table, df2, title='Merge Tables')[source]

Bases: BasicDialog

Dialog to melt table

apply()[source]

Do the operation

createWidgets()[source]

Create widgets

staticMetaObject = <PySide2.QtCore.QMetaObject object>
updateColumns()[source]
class snipgenie.widgets.MultipleInputDialog(parent, options=None, title='Input', width=400, height=200)[source]

Bases: QDialog

Qdialog with multiple inputs

accept(self) None[source]
staticMetaObject = <PySide2.QtCore.QMetaObject object>
class snipgenie.widgets.PlainTextEditor(parent=None, **kwargs)[source]

Bases: QPlainTextEdit

contextMenuEvent(self, e: PySide2.QtGui.QContextMenuEvent) None[source]
staticMetaObject = <PySide2.QtCore.QMetaObject object>
zoom(delta)[source]
class snipgenie.widgets.PlotViewer(parent=None)[source]

Bases: QWidget

matplotlib plots widget

clear()[source]

Clear plot

create_figure(fig=None)[source]

Create canvas and figure

redraw()[source]
set_figure(fig)[source]

Set the figure

staticMetaObject = <PySide2.QtCore.QMetaObject object>
zoom(zoomin=True)[source]

Zoom in/out to plot by changing size of elements

class snipgenie.widgets.PreferencesDialog(parent, options={})[source]

Bases: QDialog

Preferences dialog from config parser options

apply()[source]

Apply options to current table

createButtons(parent)[source]
createWidgets(options)[source]

create widgets

reset()[source]

Reset to defaults

setDefaults()[source]

Populate default kwds dict

staticMetaObject = <PySide2.QtCore.QMetaObject object>
updateWidgets(kwds=None)[source]

Update widgets from stored or supplied kwds

class snipgenie.widgets.SimpleBamViewer(parent=None, filename=None)[source]

Bases: QDialog

Sequence records features viewer using dna_features_viewer

add_widgets()[source]

Add widgets

find_gene()[source]

Go to selected gene if annotation present

goto()[source]
load_data(bam_file, ref_file, gb_file=None, vcf_file=None)[source]

Load reference seq and get contig/chrom names

next_page()[source]
prev_page()[source]
redraw(xstart=1)[source]

Plot the features

set_chrom(chrom)[source]

Set the selected record which also updates the plot

staticMetaObject = <PySide2.QtCore.QMetaObject object>
update_chrom(chrom=None)[source]

Update after chromosome selection changed

value_changed()[source]

Callback for widgets

zoom_in()[source]

Zoom in

zoom_out()[source]

Zoom out

class snipgenie.widgets.TableViewer(parent=None, dataframe=None, **kwargs)[source]

Bases: QDialog

View row of data in table

setDataFrame(dataframe)[source]
staticMetaObject = <PySide2.QtCore.QMetaObject object>
class snipgenie.widgets.TextViewer(parent=None, text='', width=200, height=400, title='Text')[source]

Bases: QDialog

Plain text viewer

add_widgets()[source]

Add widgets

staticMetaObject = <PySide2.QtCore.QMetaObject object>
class snipgenie.widgets.ToolBar(table, parent=None)[source]

Bases: QWidget

Toolbar class

addButton(name, function, icon)[source]
createButtons()[source]
staticMetaObject = <PySide2.QtCore.QMetaObject object>
snipgenie.widgets.addToolBarItems(toolbar, parent, items)[source]

Populate toolbar from dict of items

snipgenie.widgets.dialogFromOptions(parent, opts, sections=None, wrap=2, section_wrap=4, style=None)[source]

Get Qt widgets dialog from a dictionary of options. :param opts: options dictionary :param sections: :param section_wrap: how many sections in one row :param style: stylesheet css if required

snipgenie.widgets.getWidgetValues(widgets)[source]

Get values back from a set of widgets

snipgenie.widgets.setWidgetValues(widgets, values)[source]

Set values for a set of widgets from a dict

snipgenie.tools module

Various methods for bacterial genomics. Created Nov 2019 Copyright (C) Damien Farrell

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

snipgenie.tools.bam_to_fastq(filename)[source]

bam to fastq using samtools

snipgenie.tools.batch_iterator(iterator, batch_size)[source]

Returns lists of length batch_size.

This can be used on any iterator, for example to batch up SeqRecord objects from Bio.SeqIO.parse(…), or to batch Alignment objects from Bio.AlignIO.parse(…), or simply lines from a file handle.

This is a generator function, and it returns lists of the entries from the supplied iterator. Each list will have batch_size entries, although the final list may be shorter.

snipgenie.tools.blast_fasta(database, filename, **kwargs)[source]

Blast a fasta file

snipgenie.tools.blast_sequences(database, seqs, labels=None, **kwargs)[source]

Blast a set of sequences to a local or remote blast database :param database: local or remote blast db name

‘nr’, ‘refseq_protein’, ‘pdb’, ‘swissprot’ are valide remote dbs

Parameters
  • seqs – sequences to query, list of strings or Bio.SeqRecords

  • labels – list of id names for sequences, optional but recommended

Returns

pandas dataframe with top blast results

snipgenie.tools.checkDict(d)[source]

Check a dict recursively for non serializable types

snipgenie.tools.clustal_alignment(filename=None, seqs=None, command='clustalw')[source]

Align 2 sequences with clustal

snipgenie.tools.concat_seqrecords(recs)[source]

Join seqrecords together

snipgenie.tools.core_alignment_from_vcf(vcf_file, callback=None, uninformative=False, missing=False, omit=None)[source]

Get core SNP site calls as sequences from a multi sample vcf file. :param vcf_file: multi-sample vcf (e.g. produced by app.variant_calling) :param uninformative: whether to include uninformative sites :param missing: whether to include sites with one or more missing samples (ie. no coverage) :param omit: list of samples to exclude if required

snipgenie.tools.dataframe_to_fasta(df, seqkey='translation', idkey='locus_tag', descrkey='description', outfile='out.faa')[source]

Genbank features to fasta file

snipgenie.tools.diffseqs(seq1, seq2)[source]

Diff two sequences

snipgenie.tools.fasta_to_dataframe(infile, header_sep=None, key='name', seqkey='sequence')[source]

Get fasta proteins into dataframe

snipgenie.tools.fastq_quality_report(filename, figsize=(7, 5), **kwargs)[source]

Fastq quality plots

snipgenie.tools.fastq_random_seqs(filename, size=50)[source]

Random sequences from fastq file. Requires pyfastx. Creates a fastq index which will be a large file.

snipgenie.tools.fastq_to_dataframe(filename, size=5000)[source]

Convert fastq to dataframe. size: limit to the first reads of total size, use None to get all reads Returns: dataframe with reads

snipgenie.tools.fastq_to_fasta(filename, out, size=1000)[source]

Convert fastq to fasta size: limit to the first reads of total size

snipgenie.tools.fastq_to_rec(filename, size=50)[source]

Get reads from a fastq file :param size: limit

Returns: biopython seqrecords

snipgenie.tools.features_summary(df)[source]

SeqFeatures dataframe summary

snipgenie.tools.fetch_sra_reads(df, path)[source]

Download a set of reads from SRA using dataframe with runs

snipgenie.tools.genbank_to_dataframe(infile, cds=False)[source]

Get genome records from a genbank file into a dataframe returns a dataframe with a row for each cds/entry

snipgenie.tools.get_attributes(obj)[source]

Get non hidden and built-in type object attributes that can be persisted

snipgenie.tools.get_blast_results(filename)[source]

Get blast results into dataframe. Assumes column names from local_blast method. :returns: dataframe

snipgenie.tools.get_chrom(filename)[source]

Get chromosome name from fasta file

snipgenie.tools.get_cmd(cmd)[source]

Get windows version of a command if required

snipgenie.tools.get_fasta_length(filename)[source]

Get length of reference sequence

snipgenie.tools.get_fastq_info(filename)[source]
snipgenie.tools.get_fastq_read_lengths(filename)[source]

Return fastq read lengths

snipgenie.tools.get_fastq_size(filename)[source]

Return fastq number of reads

snipgenie.tools.get_gc(filename, limit=10000.0)[source]
snipgenie.tools.get_mean_depth(bam_file, chrom=None, start=None, end=None, how='mean')[source]

Get mean depth from bam file

snipgenie.tools.get_sb_number(binary_str)[source]

Get SB number from binary pattern usinf database reference

snipgenie.tools.get_snp_matrix(df)[source]

SNP matrix from multi sample vcf dataframe

snipgenie.tools.get_spoligotype(filename, reads_limit=3000000, threshold=2, threads=4)[source]

Get mtb spoligotype from WGS reads

snipgenie.tools.get_spoligotypes(samples, spo=None)[source]

Get spoligotypes for multiple M.bovis strains

snipgenie.tools.get_subsample_reads(filename, outpath, reads=10000)[source]

Sub-sample a fastq file with first n reads. :param filename: input fastq.gz file :param outpath: output directory to save new file :param reads: how many reads to sample from start

snipgenie.tools.get_unique_snps(names, df, present=True)[source]

Get snps unique to one or more samples from a SNP matrix. :param name: name of sample(s) :param df: snp matrix from app.get_aa_snp_matrix(csq) :param present: whether snp should be present/absent

snipgenie.tools.get_vcf_samples(filename)[source]

Get list of samples in a vcf/bcf

snipgenie.tools.gff_bcftools_format(in_file, out_file)[source]

Convert a genbank file to a GFF format that can be used in bcftools csq. see https://github.com/samtools/bcftools/blob/develop/doc/bcftools.txt#L1066-L1098. :param in_file: genbank file :param out_file: name of GFF file

snipgenie.tools.gff_to_records(gff_file)[source]

Get features from gff file

snipgenie.tools.gunzip(infile, outfile)[source]

Gunzip a file

snipgenie.tools.kraken(file1, file2='', dbname='STANDARD16', threads=4)[source]

Run kraken2 on single/paired end fastq files

snipgenie.tools.local_blast(database, query, output=None, maxseqs=50, evalue=0.001, compress=False, cmd='blastn', threads=4, show_cmd=False, **kwargs)[source]

Blast a local database. :param database: local blast db name :param query: sequences to query, list of strings or Bio.SeqRecords

Returns

pandas dataframe with top blast results

snipgenie.tools.make_blast_database(filename, dbtype='nucl')[source]

Create a blast db from fasta file

snipgenie.tools.move_files(files, path)[source]
snipgenie.tools.normpdf(x, mean, sd)[source]

Normal distribution function

snipgenie.tools.pdf_qc_reports(filenames, outfile='qc_report.pdf')[source]

Save pdf reports of fastq file quality info

snipgenie.tools.plot_fastq_gc_content(filename, ax=None, limit=50000)[source]

Plot fastq gc conent

snipgenie.tools.plot_fastq_qualities(filename, ax=None, limit=10000)[source]

Plot fastq qualities for illumina reads.

snipgenie.tools.records_to_dataframe(records, cds=False, nucl_seq=False)[source]

Get features from a biopython seq record object into a dataframe :param features: Bio SeqFeatures :param returns: a dataframe with a row for each cds/entry.

snipgenie.tools.remote_blast(db, query, maxseqs=50, evalue=0.001, **kwargs)[source]

Remote blastp. :param query: fasta file with sequence to blast :param db: database to use - nr, refseq_protein, pdb, swissprot

snipgenie.tools.resource_path(relative_path)[source]

Get absolute path to resource, works for dev and for PyInstaller

snipgenie.tools.samtools_coverage(bam_file)[source]

Get coverage/depth stats from bam file

snipgenie.tools.samtools_depth(bam_file, chrom=None, start=None, end=None)[source]

Get depth from bam file

snipgenie.tools.samtools_flagstat(filename)[source]

Parse samtools flagstat output into dictionary

snipgenie.tools.samtools_tview(bam_file, chrom, pos, width=200, ref='', display='T')[source]

View bam alignment with samtools

snipgenie.tools.set_attributes(obj, data)[source]

Set attributes from a dict. Used for restoring settings in tables

snipgenie.tools.snp_dist_matrix(aln)[source]

Get pairwise snps distances from biopython Multiple Sequence Alignment object. returns: pandas dataframe

snipgenie.tools.trim_reads(filename1, filename2, outpath, quality=20, method='cutadapt', threads=4)[source]

Trim adapters using cutadapt

snipgenie.tools.trim_reads_default(filename, outfile, right_quality=35)[source]

Trim adapters - built in method

snipgenie.tools.vcf_to_dataframe(vcf_file)[source]

Convert a multi sample vcf to dataframe. Records each samples FORMAT fields. :param vcf_file: input multi sample vcf

Returns: pandas DataFrame

snipgenie.app module

snipgenie methods for cmd line tool. Created Nov 2019 Copyright (C) Damien Farrell

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warroanty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

class snipgenie.app.Logger(logfile='log.dat')[source]

Bases: object

flush()[source]
write(message)[source]
class snipgenie.app.WorkFlow(**kwargs)[source]

Bases: object

Class for implementing a prediction workflow from a set of options

run()[source]

Run workflow

setup()[source]

Setup main parameters

snipgenie.app.align_reads(df, idx, outdir='mapped', callback=None, aligner='bwa', platform='illumina', unmapped=None, **kwargs)[source]

Align multiple files. Requires a dataframe with a ‘sample’ column to indicate paired files grouping. If a trimmed column is present these files will align_reads instead of the raw ones. :param df: dataframe with sample names and filenames :param idx: index name :param outdir: output folder :param unmapped_dir: folder for unmapped files if required

snipgenie.app.blast_contaminants(filename, limit=2000, random=False, pident=98, qcovs=90)[source]

Blast reads to contaminants database Returns: percentages of reads assigned to each species.

snipgenie.app.check_platform()[source]

See if we are running in Windows

snipgenie.app.check_samples_aligned(samples, outdir)[source]

Check how many samples already aligned

snipgenie.app.check_samples_unique(samples)[source]

Check that sample names are unique

snipgenie.app.clean_bam_files(samples, path, remove=False)[source]

Check if any bams in output not in samples and remove. Not used in workflow.

snipgenie.app.copy_ref_genomes()[source]

Copy default ref genome files to config dir

snipgenie.app.csq_call(ref, gff_file, vcf_file, csqout)[source]

Consequence calling

snipgenie.app.fetch_binaries()[source]

Get windows binaries – windows only

snipgenie.app.fetch_contam_file()[source]

Get contam sequences

snipgenie.app.get_aa_snp_matrix(df)[source]

Get presence/absence matrix from csq calls table

snipgenie.app.get_files_from_paths(paths, ext='*.f*q.gz', filter_list=None)[source]

Get files in multiple paths. :param ext: wildcard for file types to parse eg. *.f*q.gz] :param filter_list: list of labels that should be present in the filenames, optional

snipgenie.app.get_pivoted_samples(df)[source]

Get pivoted samples by pair, returns a table with one sample per row and filenames in separate columns.

snipgenie.app.get_samples(filenames, sep='-', index=0)[source]

Get sample pairs from list of files, usually fastq. This returns a dataframe of unique sample labels for the input and tries to recognise the paired files. :param sep: separator to split name on :param index: placement of label in split list, default 0

snipgenie.app.get_samples_from_bam(filenames, sep='-', index=0)[source]

Samples from bam files

snipgenie.app.main()[source]

Run the application

snipgenie.app.mapping_stats(samples)[source]

Get stats on mapping of samples

snipgenie.app.mask_filter(vcf_file, mask_file, overwrite=False, outdir=None)[source]

Remove any masked sites using a bed file, overwrites input

snipgenie.app.mpileup(bam_file, ref, out, overwrite=False)[source]

Run bcftools for single file.

snipgenie.app.mpileup_multiprocess(bam_files, ref, outpath, threads=4, callback=None)[source]

Run mpileup in parallel over multiple files and make separate bcfs. Assumes alignment to a bacterial reference with a single chromosome.

snipgenie.app.mpileup_parallel(bam_files, ref, outpath, threads=4, callback=None, tempdir=None)[source]

Run mpileup in over multiple regions with GNU parallel on linux or rush on Windows Separate bcf files are then joined together. Assumes alignment to a bacterial reference with a single chromosome.

snipgenie.app.mpileup_region(region, out, bam_files, callback=None)[source]

Run bcftools for single region.

snipgenie.app.overwrite_vcf(vcf_file, sites, outdir=None)[source]

Make a new vcf with subset of sites

snipgenie.app.read_csq_file(filename)[source]

Read csq tsv outpt file into dataframe

snipgenie.app.relabel_vcfheader(vcf_file, sample_file)[source]

Re-label samples in vcf header

snipgenie.app.run_bamfiles(bam_files, ref, gff_file=None, mask=None, outdir='.', threads=4, sep='_', labelindex=0, samples=None, **kwargs)[source]

Run workflow with bam files from a previous sets of alignments. We can arbitrarily combine results from multiple other runs this way. kwargs are passed to variant_calling method. Should write a samples.txt file in the outdir if vcf header is to be relabelled. :param samples: dataframe of sample names, if not provided try to get from bam files

snipgenie.app.site_proximity_filter(vcf_file, dist=10, overwrite=False, outdir=None)[source]

Remove any pairs of sites within dist of each other. :param vcf_file: input vcf file with positions to filter :param dist: distance threshold :param overwrite: whether to overwrite the vcf

snipgenie.app.test_run()[source]

Test run

snipgenie.app.trim_files(df, outpath, overwrite=False, threads=4, quality=30)[source]

Batch trim fastq files

snipgenie.app.variant_calling(bam_files, ref, outpath, relabel=True, threads=4, callback=None, overwrite=False, filters=None, gff_file=None, mask=None, tempdir=None, custom_filters=False, **kwargs)[source]

Call variants with bcftools

snipgenie.app.worker(args)[source]
snipgenie.app.write_samples(df, path)[source]

Write out sample names only using dataframe from get_samples

snipgenie.aligners module

Aligner methods for bacterial genomics. Created Nov 2019 Copyright (C) Damien Farrell

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

snipgenie.aligners.bowtie_align(file1, file2, idx, out, unmapped=None, threads=2, overwrite=False, verbose=True, options='')[source]

Map reads using bowtie

snipgenie.aligners.build_bowtie_index(fastafile, path=None)[source]

Build a bowtie index :param fastafile: file input :param path: folder to place index files

snipgenie.aligners.build_bwa_index(fastafile, path=None, show_cmd=True, overwrite=True)[source]

Build a bwa index

snipgenie.aligners.build_subread_index(fastafile)[source]

Build an index for subread

snipgenie.aligners.bwa_align(file1, file2, idx, out, threads=4, overwrite=False, options='', filter=None, unmapped=None)[source]

Align reads to a reference with bwa. :param file1: fastq files :param file2: fastq files :param idx: bwa index name :param out: output bam file name :param options: extra command line options e.g. -k INT for seed length :param unmapped: path to file for unmapped reads if required

snipgenie.aligners.minimap2_align(file1, file2, idx, out, platform='illumina', threads=4, overwrite=False)[source]

Align illumina/ONT reads with minimap2

snipgenie.aligners.subread_align(file1, file2, idx, out, threads=2, overwrite=False, verbose=True)[source]

Align reads with subread

snipgenie.plotting module

Plotting methods for snpgenie Created Jan 2020 Copyright (C) Damien Farrell

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

snipgenie.plotting.create_grid(gdf=None, bounds=None, n_cells=10, overlap=False, crs='EPSG:29902')[source]

Create square grid that covers a geodataframe area or a fixed boundary with x-y coords returns: a GeoDataFrame of grid polygons

snipgenie.plotting.create_hex_grid(gdf=None, bounds=None, n_cells=10, overlap=False, crs='EPSG:29902')[source]

Hexagonal grid over geometry. See https://sabrinadchan.github.io/data-blog/building-a-hexagonal-cartogram.html

snipgenie.plotting.display_igv(url='http://localhost:8888/files/', ref_fasta='', bams=[], gff_file=None, vcf_file=None)[source]

Display IGV tracks in jupyter, requires the igv_jupyterlab package. Example usage:

bams = {‘24’:’results/mapped/24-MBovis.bam’} igv=display_igv(url=’http://localhost:8888/files/’, ref_fasta=’Mbovis_AF212297.fa’,

gff_file=’results/Mbovis_AF212297.gb.gff’, vcf_file=’results/filtered.vcf.gz’, bams=bams)

snipgenie.plotting.draw_pie(vals, xpos, ypos, colors, size=500, ax=None)[source]

Draw a pie at a specific position on an mpl axis. Used to draw spatial pie charts on maps. :param vals: values for pie :param xpos: x coord :param ypos: y coord :param colors: colors of values :param size: size of pie chart

snipgenie.plotting.gen_colors(cmap, n, reverse=False)[source]

Generates n distinct color from a given colormap. :param cmap: The name of the colormap you want to use.

Refer https://matplotlib.org/stable/tutorials/colors/colormaps.html to choose Suggestions: For Metallicity in Astrophysics: Use coolwarm, bwr, seismic in reverse For distinct objects: Use gnuplot, brg, jet,turbo.

Parameters
  • n (int) – Number of colors you want from the cmap you entered.

  • reverse (bool) – False by default. Set it to True if you want the cmap result to be reversed.

Returns

A list with hex values of colors.

Return type

colorlist(list)

Taken from the mycolorpy package by binodbhttr see also https://matplotlib.org/stable/tutorials/colors/colormaps.html

snipgenie.plotting.get_bam_aln(bam_file, chr, start, end, group=False)[source]

Get all aligned reads from a sorted bam file for within the given coords

snipgenie.plotting.get_chrom_from_bam(bam_file)[source]

Get first sequence name in a bam file

snipgenie.plotting.get_color_mapping(df, col, cmap=None, seed=1)[source]

Get random color map for categorcical dataframe column

snipgenie.plotting.get_coverage(bam_file, chr, start, end)[source]

Get coverage from bam file at specified region

snipgenie.plotting.get_fasta_length(filename, key=None)[source]

Get length of reference sequence

snipgenie.plotting.get_fasta_names(filename)[source]

Get names of fasta sequences

snipgenie.plotting.get_fasta_sequence(filename, start, end, key=0)[source]

Get chunk of indexed fasta sequence at start/end points

snipgenie.plotting.heatmap(df, cmap='gist_gray_r', w=15, h=5, ax=None)[source]

Plot dataframe matrix

snipgenie.plotting.make_legend(fig, colormap, loc=(1.05, 0.6), title='', fontsize=12)[source]

Make a figure legend wth provided color mapping

snipgenie.plotting.plot_bam_alignment(bam_file, chr, xstart, xend, ystart=0, yend=100, rect_height=0.6, fill_color='gray', ax=None)[source]

bam alignments plotter. :param bam_file: name of a sorted bam file :param start: start of range to show :param end: end of range

snipgenie.plotting.plot_coverage(df, plot_width=800, plot_height=60, xaxis=True, ax=None)[source]

Plot a bam coverage dataframe returned from get_coverage :param df: dataframe of coverage data (from get_coverage) :param plot_width: width of plot :param xaxis: plot the x-axis ticks and labels

snipgenie.plotting.plot_features(rec, ax, rows=3, xstart=0, xend=30000)[source]
snipgenie.plotting.random_colors(n=10, seed=1)[source]

Generate random hex colors as list of length n.

snipgenie.plotting.show_colors(colors)[source]

display a list of colors

snipgenie.trees module

Tree methods for bacterial phylogenetics, mostly using ete3. Created Nov 2019 Copyright (C) Damien Farrell

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

snipgenie.trees.biopython_draw_tree(filename)[source]
snipgenie.trees.color_leaves(t, colors, color_bg=False)[source]
snipgenie.trees.colors_from_labels(df, name, group)[source]

Colors from dataframe columns for use with an ete3 tree drawing

snipgenie.trees.convert_branch_lengths(treefile, outfile, snps)[source]
snipgenie.trees.create_tree(filename=None, tree=None, ref=None, labelmap=None, colormap=None, color_bg=False, format=1)[source]

Draw a tree

snipgenie.trees.delete_nodes(t, names)[source]
snipgenie.trees.format_nodes(t)[source]
snipgenie.trees.get_clusters(tree)[source]

Get snp clusters from newick tree using TreeCluster.py

snipgenie.trees.get_colormap(values)[source]
snipgenie.trees.remove_nodes(tree, names)[source]
snipgenie.trees.remove_tiplabels(t)[source]
snipgenie.trees.run_RAXML(infile, name='variants', threads=8, bootstraps=100, outpath='.')[source]

Run Raxml pthreads. :returns: name of .tree file.

snipgenie.trees.run_fasttree(infile, outpath, bootstraps=100)[source]

Run fasttree

snipgenie.trees.run_treecluster(f, threshold, method='max_clade')[source]

Run treecluster on a newick tree. Clustering Method (options: avg_clade, length,

length_clade, max, max_clade, med_clade, root_dist, single_linkage_clade) (default: max_clade)

see https://github.com/niemasd/TreeCluster

snipgenie.trees.set_nodesize(t, size=12)[source]

Change the node size

snipgenie.trees.set_tiplabels(t, labelmap)[source]
snipgenie.trees.toytree_draw(tre, meta, labelcol, colorcol)[source]

Draw colored tree with toytree

snipgenie.simulate module

Simulate reads Created Sep 2022 Copyright (C) Damien Farrell

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

snipgenie.simulate.artificial_fastq_generator(ref, outfile, cmp=100)[source]

Generate reads from reference

snipgenie.simulate.generate_fastqs(infile, outpath, reads=100000.0, overwrite=False)[source]

Make multiple fastqs

snipgenie.simulate.run_phastsim(path, ref, newick)[source]

Run phastsim

Module contents