pymodulon.compare
Module Contents
Functions
|
Given two M matrices, returns the dot graph and name links of the various |
|
Given a set of links between M matrices, generates a dot graph of the various |
|
Reorganizes and renames genes in a dataframe to be consistent with |
|
Compares two M matrices between a single organism or across organisms and |
|
Makes protein files for all the genes in the genbank file |
|
Creates GenBank Databases from Protein FASTA of an organism |
|
Runs Bidirectional Best Hit BLAST to find orthologs utilizing two protein |
|
Computes gene lengths |
|
Runs BLASTP between two organisms |
|
Checks inputs are acceptable |
|
Checks that inputs are the same |
-
pymodulon.compare.
_get_orthologous_imodulons
(M1, M2, method, cutoff)[source] Given two M matrices, returns the dot graph and name links of the various connected ICA components
- Parameters
- Returns
links – Links and distances of connected iModulons
- Return type
-
pymodulon.compare.
_make_dot_graph
(links, show_all, names1, names2)[source] Given a set of links between M matrices, generates a dot graph of the various connected iModulons
- Parameters
- Returns
dot – Dot graph of connected iModulons
- Return type
Digraph
-
pymodulon.compare.
convert_gene_index
(df1, df2, ortho_file=None, keep_locus=False)[source] Reorganizes and renames genes in a dataframe to be consistent with another object/organism
- Parameters
- Returns
df1_new (~pandas.DataFrame) – M matrix for organism 1 with indexes translated into orthologs
df2_new (~pandas.DataFrame) – M matrix for organism 2 with indexes translated into orthologs
-
pymodulon.compare.
compare_ica
(M1, M2, ortho_file=None, cutoff=0.25, method='pearson', plot=True, show_all=False)[source] Compares two M matrices between a single organism or across organisms and returns the connected iModulons
- Parameters
M1 (DataFrame) – M matrix from the first organism
M2 (DataFrame) – M matrix from the second organism
ortho_file (str, optional) – Path to orthology file between organisms (default: None)
cutoff (float) – Cut off value for correlation metric (default: .25)
method (str or Callable) – Correlation metric to use from {‘pearson’, ‘kendall’, ‘spearman’} or callable (see
corr()
)plot (bool) – Create dot plot of matches (default: True)
show_all (bool) – Show all iModulons regardless of their linkage (default: False)
- Returns
matches (list) – Links and distances of connected iModulons
dot (Digraph) – Dot graph of connected iModulons
-
pymodulon.compare.
make_prots
(gbk, out_path, lt_key='locus_tag')[source] Makes protein files for all the genes in the genbank file
-
pymodulon.compare.
make_prot_db
(fasta_file, outname=None, combined='combined.fa')[source] Creates GenBank Databases from Protein FASTA of an organism
- Parameters
- Returns
None
- Return type
-
pymodulon.compare.
get_bbh
(db1, db2, outdir='bbh', outname=None, mincov=0.8, evalue=0.001, threads=1, force=False, savefiles=True)[source] Runs Bidirectional Best Hit BLAST to find orthologs utilizing two protein FASTA files. Outputs a CSV file of all orthologous genes.
- Parameters
db1 (str) – Path to protein FASTA file for organism 1
db2 (str) – Path to protein FASTA file for organism 2
outdir (str) – Path to output directory (default: “bbh”)
outname (str) – Name of output CSV file (default: <db1>_vs_<db2>_parsed.csv)
mincov (float) – Minimum coverage to call hits in BLAST, must be between 0 and 1 (default: 0.8)
evalue (float) – E-value threshold for BlAST hist (default: .001)
threads (int) – Number of threads to use for BLAST (default: 1)
force (bool) – If True, overwrite existing files (default: False)
savefiles (bool) – If True, save files to ‘outdir’ (default: True)
- Returns
out – Table of bi-directional BLAST hits between the two organisms
- Return type