pymodulon.util

General utility functions for the pymodulon package

Module Contents

Functions

_check_table(table, name, index=None, index_col=0)

_check_table_helper(table, index, name)

_check_dict(table, index_col=0)

compute_threshold(ic, dagostino_cutoff)

Computes D’agostino-test-based threshold for a component of an M matrix

dima(ica_data, sample1, sample2, threshold=5, fdr=0.1, alternate_A=None)

Creates DIMA table of differentially expressed iModulons

_parse_sample(ica_data, sample)

Parses sample inputs into a list of sample IDs

explained_variance(ica_data, genes=None, samples=None, imodulons=None, reference=None)

Computes the fraction of variance explained by iModulons (from 0 to 1)

infer_activities(ica_data, data)

Infer iModulon activities for external data

mutual_info_distance(x, y)

mi(x, y, z=None, k=3, base=2, alpha=0)

Mutual information of x and y (conditioned on z if z is not None)

entropy(x, k=3, base=2)

The classic K-L k-nearest neighbor continuous entropy estimator

add_noise(x, intens=1e-10)

Small noise to break degeneracy, see doc.

build_tree(points)

query_neighbors(tree, x, k)

avgdigamma(points, dvec)

This part finds number of neighbors in some radius in the marginal space

lnc_correction(tree, points, k, alpha)

count_neighbors(tree, x, r)

pymodulon.util._check_table(table, name, index=None, index_col=0)[source]
pymodulon.util._check_table_helper(table, index, name)[source]
pymodulon.util._check_dict(table, index_col=0)[source]
pymodulon.util.compute_threshold(ic, dagostino_cutoff)[source]

Computes D’agostino-test-based threshold for a component of an M matrix

Parameters
  • ic (Series) – Pandas Series containing an independent component

  • dagostino_cutoff (int) – Minimum D’agostino test statistic value to determine threshold

Returns

iModulon threshold – List of thresholds for each iModulon

Return type

list

pymodulon.util.dima(ica_data, sample1, sample2, threshold=5, fdr=0.1, alternate_A=None)[source]

Creates DIMA table of differentially expressed iModulons

Parameters
  • ica_data (IcaData) – IcaData data object

  • sample1 (str or list) – List of sample IDs or name of “project:condition”

  • sample2 (str or list) – List of sample IDs or name of “project:condition”

  • threshold (float) – Minimum activity difference to determine DiMAs (default = 5)

  • fdr (float) – False Detection Rate (default = .1)

  • alternate_A (DataFrame) – Alternate A to use (default = None)

Returns

results – Table of differentially expressed iModulons

Return type

DataFrame

pymodulon.util._parse_sample(ica_data, sample)[source]

Parses sample inputs into a list of sample IDs

Parameters
  • ica_data (IcaData) – IcaData data object

  • sample (list) – Sequence of sample IDs or “project:condition”

Returns

samples – A list of samples

Return type

list

pymodulon.util.explained_variance(ica_data, genes=None, samples=None, imodulons=None, reference=None)[source]

Computes the fraction of variance explained by iModulons (from 0 to 1)

Parameters
  • ica_data (IcaData) – IcaData data object

  • genes (str or list, optional) – List of genes to use (default: all genes)

  • samples (str or list, optional) – List of samples to use (default: all samples)

  • imodulons (int or str or list, optional) – List of iModulons to use (default: all iModulons)

  • reference (list, optional) – List of samples that represent the reference condition for the set. If none are provided, uses the dataset-specific reference condition.

Returns

Fraction of variance explained by selected iModulons for selected genes/samples

Return type

float

pymodulon.util.infer_activities(ica_data, data)[source]

Infer iModulon activities for external data

Parameters
  • ica_data (IcaData) – IcaData data object

  • data (DataFrame) – External expression profiles (must be centered to a reference)

Returns

new_activities – Inferred activities for the expression profiles

Return type

DataFrame

pymodulon.util.mutual_info_distance(x, y)[source]
pymodulon.util.mi(x, y, z=None, k=3, base=2, alpha=0)[source]

Mutual information of x and y (conditioned on z if z is not None) x, y should be a list of vectors, e.g. x = [[1.3], [3.7], [5.1], [2.4]] if x is a one-dimensional scalar and we have four samples

pymodulon.util.entropy(x, k=3, base=2)[source]

The classic K-L k-nearest neighbor continuous entropy estimator x should be a list of vectors, e.g. x = [[1.3], [3.7], [5.1], [2.4]] if x is a one-dimensional scalar and we have four samples

pymodulon.util.add_noise(x, intens=1e-10)[source]

Small noise to break degeneracy, see doc.

pymodulon.util.build_tree(points)[source]
pymodulon.util.query_neighbors(tree, x, k)[source]
pymodulon.util.avgdigamma(points, dvec)[source]

This part finds number of neighbors in some radius in the marginal space returns expectation value of <psi(nx)>

pymodulon.util.lnc_correction(tree, points, k, alpha)[source]
pymodulon.util.count_neighbors(tree, x, r)[source]