pymodulon.imodulondb

Functions for writing a directory for iModulonDB webpages

Module Contents

Functions

imodulondb_compatibility(model, inplace=False, tfcomplex_to_gene=None)

Checks for all issues and missing information prior to exporting to iModulonDB.

imodulondb_export(model, path='.', cat_order=None, tfcomplex_to_gene=None, skip_iMs=False, skip_genes=False)

Generates the iModulonDB page for the data and exports to the path.

imdb_dataset_table(model)

Converts the model’s imodulondb_table into dataset metadata

imdb_iM_table(imodulon_table, cat_order=None)

Reformats the iModulon table according

imdb_gene_presence(model)

Generates the two versions of the gene presence file, one as a binary

imodulondb_main_site_files(model, path_prefix='.', rewrite_annotations=True, cat_order=None)

Generates all parts of the site that do not require large iteration loops

imdb_generate_im_files(model, path_prefix='.', gene_scatter_x='start', tfcomplex_to_gene=None)

Generates all files for all iModulons in data

imdb_generate_gene_files(model, path_prefix='.')

Generates all files for all iModulons in IcaData object

parse_tf_string(model, tf_str, verbose=False)

Returns a list of relevant tfs from a string. Will ignore TFs not in the

imdb_gene_table_df(model, k)

Creates the gene table dataframe for iModulonDB

_component_DF(model, k, tfs=None)

Helper function for imdb_gene_hist_df

_tf_combo_string(row)

Creates a formatted string for the histogram legends. Helper function for

_sort_tf_strings(tfs, unique_elts)

Sorts TF strings for the legend of the histogram. Helper function for

imdb_gene_hist_df(model, k, bins=20, tol=0.001)

Creates the gene histogram for an iModulon

_gene_color_dict(model)

Helper function to match genes to colors based on COG. Used by

imdb_gene_scatter_df(model, k, gene_scatter_x='start')

Generates a dataframe for the gene scatter plot in iModulonDB

generate_n_replicates_column(model)

Generates the “n_replicates” column of the sample_table for iModulonDB.

imdb_activity_bar_df(model, k)

Generates a dataframe for the activity bar graph of iModulon k

_parse_regulon_string(model, s)

The Bacillus microarray dataset uses [] to create unusually complicated

_get_reg_genes(model, tf)

Finds the set of genes regulated by the boolean combination of regulators

imdb_regulon_venn_df(model, k)

Generates a dataframe for the regulon venn diagram of iModulon k. Returns

get_tfs_to_scatter(model, tf_string, tfcomplex_to_genename=None, verbose=False)

param model

IcaData object

imdb_regulon_scatter_df(model, k, tfcomplex_to_genename=None)

param model

IcaData object

tf_with_links(model, tf_str)

Adds links to the regulator string

tf_with_links_brackets(model, tf_str)

Adds links to the regulator string

imdb_imodulon_basics_df(model, k, reg_venn, reg_scatter)

Generates a dataframe for the metadata of iModulon k

make_im_directory(model, k, path_prefix='.', gene_scatter_x='start', tfcomplex_to_genename=None)

param model

IcaData object

imdb_gene_activity_bar_df(model, gene_id)

param model

IcaData object

imdb_gene_im_table_df(model, g, im_table, m_bin)

Generates a dataframe for the iModulon table of gene g

imdb_gene_basics_df(model, g)

param model

IcaData object

make_gene_directory(model, g, path_prefix='.')

Generates all data for gene g, stores it in a subfolder of path_prefix

pymodulon.imodulondb.imodulondb_compatibility(model, inplace=False, tfcomplex_to_gene=None)[source]

Checks for all issues and missing information prior to exporting to iModulonDB. If inplace = True, modifies the model (not recommended for main model variables).

Parameters
  • model (IcaData) – IcaData object to check

  • inplace (bool, optional) – If true, modifies the model to prepare for export. Not recommended for use with your main model variable.

  • tfcomplex_to_gene (dict, optional) – dictionary pointing complex TRN entries to matching gene names in the gene table (ex: {“FlhDC”:”flhD”})

Returns

  • table_issues (pd.DataFrame) – Each row corresponds to an issue with one of the main class elements. Columns: * Table: which table or other variable the issue is in * Missing Column: the column of the Table with the issue (not case sensitive; capitalization is ignored). * Solution: Unless “CRITICAL” is in this cell, the site behavior if the issue remained is described here.

  • tf_issues (pd.DataFrame) – Each row corresponds to a regulator that is used in the imodulon_table. Columns: * in_trn: whether the regulator is in the model.trn. Regulators not in the TRN will be ignored in the site’s histograms and gene tables. * has_link: whether the regulator has a link in tf_links. If not, no link to external regulator databases will be shown. * has_gene: whether the regulator can be matched to a gene in the model. If this is false, then there will be no regulator scatter plot on the site. You can link TF complexes to one of their genes using the tfcomplex_to_gene input.

  • missing_g_links (pd.Series) – The genes on this list don’t have links in the gene_links. Their gene pages for these genes will not display links.

  • missing_DOIs (pd.Series) – The samples listed here don’t have DOIs in the sample_table. Clicking on their associated bars in the activity plots will not link to relevant papers.

pymodulon.imodulondb.imodulondb_export(model, path='.', cat_order=None, tfcomplex_to_gene=None, skip_iMs=False, skip_genes=False)[source]

Generates the iModulonDB page for the data and exports to the path. If certain columns are unavailable but can be filled in automatically, they will be.

Parameters
  • model (IcaData) – IcaData object to export

  • path (str, optional) – Path to iModulonDB main hosting folder (default = “.”)

  • cat_order (list, optional) – List of categories in the imodulon_table, ordered as you would like them to appear in the dataset table (default = None)

  • tfcomplex_to_gene (dict, optional) – dictionary pointing complex TRN entries to matching gene names in the gene table ex: {“FlhDC”:”flhD”}

  • skip_iMs (bool, optional) – If this is True, do not output iModulon files (to save time)

  • skip_genes (bool, optional) – If this is True, do not output gene files (to save time)

Returns

None

Return type

None

pymodulon.imodulondb.imdb_dataset_table(model)[source]

Converts the model’s imodulondb_table into dataset metadata for the gray box on the left side of the dataset page

Parameters

model (IcaData) – An IcaData object

Returns

res – A series of formatted metadata

Return type

Series

pymodulon.imodulondb.imdb_iM_table(imodulon_table, cat_order=None)[source]

Reformats the iModulon table according

Parameters
  • imodulon_table (DataFrame) – Table formatted similar to IcaData.imodulon_table

  • cat_order (list, optional) – List of categories in imodulon_table.category, ordered as desired

Returns

im_table – New iModulon table with the columns expected by iModulonDB

Return type

DataFrame

pymodulon.imodulondb.imdb_gene_presence(model)[source]

Generates the two versions of the gene presence file, one as a binary matrix, and one as a DataFrame

Parameters

model (IcaData) – An IcaData object

Returns

  • mbin (~pandas.DataFrame) – Binarized M matrix

  • mbin_list (~pandas.DataFrame) – Table mapping genes to iModulons

pymodulon.imodulondb.imodulondb_main_site_files(model, path_prefix='.', rewrite_annotations=True, cat_order=None)[source]

Generates all parts of the site that do not require large iteration loops

Parameters
  • model (IcaData) – IcaData object

  • path_prefix (str, optional) – Main folder for iModulonDB files (default = “.”)

  • rewrite_annotations (bool, optional) – Set to False if the gene_table and trn are unchanged (default = True)

  • cat_order (list, optional) – list of categories in data.imodulon_table.category, ordered as you want them to appear on the dataset page (default = None)

Returns

main_folder – Dataset folder, for use as the path_prefix in imdb_generate_im_files()

Return type

str

pymodulon.imodulondb.imdb_generate_im_files(model, path_prefix='.', gene_scatter_x='start', tfcomplex_to_gene=None)[source]

Generates all files for all iModulons in data

Parameters
  • model (IcaData) – IcaData object

  • path_prefix (str, optional) – Dataset folder in which to store the files (default = “.”)

  • gene_scatter_x (str) – Column from the gene table that specificies what to use on the X-axis of the gene scatter plot (default = “start”)

  • tfcomplex_to_gene (dict, optional) – dictionary pointing complex TRN entries to matching gene names in the gene table ex: {“FlhDC”:”flhD”}

pymodulon.imodulondb.imdb_generate_gene_files(model, path_prefix='.')[source]

Generates all files for all iModulons in IcaData object

Parameters
  • model (IcaData) – IcaData object

  • path_prefix (str, optional) – Dataset folder in which to store the files (default = “.”)

Returns

Return type

None

pymodulon.imodulondb.parse_tf_string(model, tf_str, verbose=False)[source]

Returns a list of relevant tfs from a string. Will ignore TFs not in the trn file. iModulonDB helper function.

Parameters
  • model (IcaData) – IcaData object

  • tf_str (str) – String of tfs joined by ‘+’ and ‘/’ operators

  • verbose (bool, optional) – Whether or nor to print outputs

Returns

tfs – List of relevant TFs

Return type

list

pymodulon.imodulondb.imdb_gene_table_df(model, k)[source]

Creates the gene table dataframe for iModulonDB :param model: IcaData object :type model: IcaData :param k: iModulon name :type k: int or str

Returns

res – DataFrame of the gene table that is compatible with iModulonDB

Return type

DataFrame

pymodulon.imodulondb._component_DF(model, k, tfs=None)[source]

Helper function for imdb_gene_hist_df

Parameters
  • model (IcaData) – IcaData object

  • k (int or str) – iModulon name

  • tfs (list) – List of TFs (default = None)

Returns

gene_table – Gene table for the iModulon

Return type

DataFrame

pymodulon.imodulondb._tf_combo_string(row)[source]

Creates a formatted string for the histogram legends. Helper function for imdb_gene_hist_df.

Parameters

row (Series) – Boolean series indexed by TFs for a given gene

Returns

A string formatted for display (i.e. “Regulated by …”)

Return type

str

pymodulon.imodulondb._sort_tf_strings(tfs, unique_elts)[source]

Sorts TF strings for the legend of the histogram. Helper function for imdb_gene_hist_df.

Parameters
  • tfs (list[str]) – Sequence of TFs in the desired order

  • unique_elts (list[str]) – All combination strings made by _tf_combo_string

Returns

A sorted list of combination strings that have a consistent ordering

Return type

list[str]

pymodulon.imodulondb.imdb_gene_hist_df(model, k, bins=20, tol=0.001)[source]

Creates the gene histogram for an iModulon

Parameters
  • model (IcaData) – IcaData object

  • k (int or str) – iModulon name

  • bins (int) – Number of bins in the histogram (default = 20)

  • tol (float) – Distance to threshold for deciding if a bar is in the iModulon (default = .001)

Returns

gene_hist_table – A dataframe for producing the histogram that is compatible with iModulonDB

Return type

DataFrame

pymodulon.imodulondb._gene_color_dict(model)[source]

Helper function to match genes to colors based on COG. Used by imdb_gene_scatter_df.

Parameters

model (IcaData) – IcaData object

Returns

Dictionary associating gene names to colors

Return type

dict

pymodulon.imodulondb.imdb_gene_scatter_df(model, k, gene_scatter_x='start')[source]

Generates a dataframe for the gene scatter plot in iModulonDB

Parameters
  • model (IcaData) – IcaData object

  • k (int or str) – iModulon name

  • gene_scatter_x (str) – Determines x-axis of the scatterplot

Returns

res – A dataframe for producing the scatterplot

Return type

DataFrame

pymodulon.imodulondb.generate_n_replicates_column(model)[source]

Generates the “n_replicates” column of the sample_table for iModulonDB.

Parameters

model (IcaData) – IcaData object. Will overwrite the existing column if it exists.

Returns

None

Return type

None

pymodulon.imodulondb.imdb_activity_bar_df(model, k)[source]

Generates a dataframe for the activity bar graph of iModulon k

Parameters
  • model (IcaData) – IcaData object

  • k (int or str) – iModulon name

Returns

res – A dataframe for producing the activity bar graph for iModulonDB

Return type

DataFrame

pymodulon.imodulondb._parse_regulon_string(model, s)[source]

The Bacillus microarray dataset uses [] to create unusually complicated TF strings. This function parses those, as a helper to _get_reg_genes for imdb_regulon_venn_df.

Parameters
  • model (IcaData) – IcaData object

  • s (str) – TF string

Returns

res – Set of genes regulated by this string

Return type

set

pymodulon.imodulondb._get_reg_genes(model, tf)[source]

Finds the set of genes regulated by the boolean combination of regulators in a TF string

Parameters
  • model (IcaData) – IcaData object

  • tf (str) – string of TFs separated by +, /, and/or []

Returns

reg_genes – Set of regulated genes

Return type

set[str]

pymodulon.imodulondb.imdb_regulon_venn_df(model, k)[source]

Generates a dataframe for the regulon venn diagram of iModulon k. Returns None if there is no diagram to draw

Parameters
  • model (IcaData) – IcaData object

  • k (int or str) – iModulon name

Returns

res – A DataFrame for producing the venn diagram in iModulonDB

Return type

DataFrame

pymodulon.imodulondb.get_tfs_to_scatter(model, tf_string, tfcomplex_to_genename=None, verbose=False)[source]
Parameters
  • model (IcaData) – IcaData object

  • tf_string (str or nan) – String of TFs, or np.nan

  • tfcomplex_to_genename (dict, optional) – dictionary pointing complex TRN entries to matching gene names in the gene table ex: {“FlhDC”:”flhD”}

  • verbose (bool) – Show verbose output (default: False)

Returns

res – List of gene loci

Return type

list

pymodulon.imodulondb.imdb_regulon_scatter_df(model, k, tfcomplex_to_genename=None)[source]
Parameters
  • model (IcaData) – IcaData object

  • k (int or str) – iModulon name

  • tfcomplex_to_genename (dict, optional) – dictionary pointing complex TRN entries to matching gene names in the gene table ex: {“FlhDC”:”flhD”}

Returns

res – A dataframe for producing the regulon scatter plots in iModulonDB

Return type

DataFrame

Adds links to the regulator string

Parameters
  • model (IcaData) – IcaData object

  • tf_str (str or float) – Regulator string for a given iModulon, or np.nan

Returns

res – String with links added

Return type

str

Adds links to the regulator string Used with the complicated bracket system in Bacillus Microarray

Parameters
  • model (IcaData) – IcaData object

  • tf_str (str or float) – Regulator string for a given iModulon, or np.nan

Returns

res – String with links added

Return type

str

pymodulon.imodulondb.imdb_imodulon_basics_df(model, k, reg_venn, reg_scatter)[source]

Generates a dataframe for the metadata of iModulon k

Parameters
  • model (IcaData) – IcaData object

  • k (int or str) – iModulon name

  • reg_venn (DataFrame or None) – Output of imdb_regulon_venn_df(data, k)

  • reg_scatter (DataFrame or None) – Output of imdb_regulon_scatter_df(data, k)

Returns

res – A dataframe of metadata for iModulon k in iModulonDB

Return type

DataFrame

pymodulon.imodulondb.make_im_directory(model, k, path_prefix='.', gene_scatter_x='start', tfcomplex_to_genename=None)[source]
Parameters
  • model (IcaData) – IcaData object

  • k (int or str) – iModulon name

  • path_prefix (str, optional) – Path to the dataset folder. This function creates an ‘iModulon_files/k/’ subdirectory there to store everything. (default = “.”)

  • gene_scatter_x (str) – Passed to imdb_gene_scatter_df() to indicate the x axis type of that plot (default = “start”)

  • tfcomplex_to_genename (dict, optional) – dictionary pointing complex TRN entries to matching gene names in the gene table ex: {“FlhDC”:”flhD”}

Returns

None

Return type

None

pymodulon.imodulondb.imdb_gene_activity_bar_df(model, gene_id)[source]
Parameters
  • model (IcaData) – IcaData object

  • gene_id (str) – Locus tag of gene

Returns

res – A dataframe for the activity bar of gene in iModulonDB

Return type

DataFrame

pymodulon.imodulondb.imdb_gene_im_table_df(model, g, im_table, m_bin)[source]

Generates a dataframe for the iModulon table of gene g

Parameters
  • model (IcaData) – IcaData object

  • g (str) – Gene locus tag

  • im_table (DataFrame) – Pre-cleaned version of data.imodulon_table

  • m_bin (DataFrame) – Boolean transpose version of data.M_binarized

Returns

perGene_table – A dataframe for the iModulon table of gene g in iModulonDB

Return type

DataFrame

pymodulon.imodulondb.imdb_gene_basics_df(model, g)[source]
Parameters
  • model (IcaData) – IcaData object

  • g (str) – Gene locus

Returns

res – A dataframe for the metadata of gene g in iModulonDB

Return type

DataFrame

pymodulon.imodulondb.make_gene_directory(model, g, path_prefix='.')[source]

Generates all data for gene g, stores it in a subfolder of path_prefix

Parameters
  • model (IcaData) – IcaData object

  • g (str) – Gene locus

  • path_prefix (str, optional) – Path to the dataset folder. This function creates a ‘gene_page_files/k/’ subdirectory there to store everything. (default = “.”)

Returns

im_df – Table containing iModulon information for the gene

Return type

DataFrame