4. Gene enrichment analysis

[1]:
from pymodulon.enrichment import *
from pymodulon.example_data import load_ecoli_data, trn
ica_data = load_ecoli_data()

4.1. General functions

To perform a basic enrichment test between two gene sets, use the compute_enrichment function. Optional arguments:

  • label: Label for your target gene set (e.g. regulator name or GO term)

[2]:
gene_set = ['b0002','b0003','b0004'] # e.g. an iModulon or other gene set
target_genes = ['b0002','b0005','b0007'] # e.g. a regulon or genes in a COG category
all_genes = ica_data.gene_names # List of all genes in your expression dataset
[3]:
compute_enrichment(gene_set,target_genes,all_genes)
[3]:
pvalue             0.002293
precision          0.333333
recall             0.333333
f1score            0.333333
TP                 1.000000
target_set_size    3.000000
gene_set_size      3.000000
dtype: float64

To perform an enrichment test against a regulon in a TRN table, use the compute_regulon function.

[4]:
compute_regulon_enrichment(gene_set,'lrp',all_genes,trn)
[4]:
pvalue             0.000131
precision          1.000000
recall             0.015000
f1score            0.029557
TP                 3.000000
regulon_size     200.000000
gene_set_size      3.000000
n_regs             1.000000
Name: lrp, dtype: float64

Each row of the TRN table represents a regulatory interaction. The table requires the columns regulator and gene_id.

[5]:
trn.head()
[5]:
regulator gene_id effect
0 FMN b3041 -
1 L-tryptophan b3708 +
2 L-tryptophan b3709 +
3 TPP b0066 -
4 TPP b0067 -

To perform an enrichment against all regulons in a TRN table, use the compute_trn_enrichment function. This function includes false detection correction using the Benjamini-Hochberg procedure to compute Q-values. By default, the false detection rate is 1e-5, and can be updated using the fdr argument.

[6]:
compute_trn_enrichment(gene_set,all_genes,trn)
[6]:
pvalue precision recall f1score TP regulon_size gene_set_size n_regs qvalue
thr-tRNA 0.000000e+00 1.0 1.000000 1.000000 3.0 3.0 3.0 1.0 0.000000e+00
ile-tRNA 8.354256e-09 1.0 0.333333 0.500000 3.0 9.0 3.0 1.0 2.923990e-08
lrp 1.306248e-04 1.0 0.015000 0.029557 3.0 200.0 3.0 1.0 3.047911e-04
arcA 1.540883e-03 1.0 0.006608 0.013129 3.0 454.0 3.0 1.0 2.157237e-03
fnr 1.411995e-03 1.0 0.006803 0.013514 3.0 441.0 3.0 1.0 2.157237e-03
crp 3.335561e-03 1.0 0.005111 0.010169 3.0 587.0 3.0 1.0 3.891487e-03

compute_trn_enrichment can perform enrichment against complex regulons in boolean combinations. The max_regs argument determines the number of regulators to include in the complex regulon, and the method argument determines how to combine complex regulons. method='or' computes enrichments against the union of regulons, 'and' computes enrichment against intersection of regulons, and 'both' performs both tests (default).

[7]:
compute_trn_enrichment(gene_set,all_genes,trn,max_regs=2).head()
[7]:
pvalue precision recall f1score TP regulon_size gene_set_size n_regs qvalue
fnr+ile-tRNA 0.0 1.0 1.0 1.0 3.0 3.0 3.0 2.0 0.0
thr-tRNA 0.0 1.0 1.0 1.0 3.0 3.0 3.0 1.0 0.0
arcA+ile-tRNA 0.0 1.0 1.0 1.0 3.0 3.0 3.0 2.0 0.0
lrp+thr-tRNA 0.0 1.0 1.0 1.0 3.0 3.0 3.0 2.0 0.0
arcA+thr-tRNA 0.0 1.0 1.0 1.0 3.0 3.0 3.0 2.0 0.0

Increasing the number of regulators for complex regulons beyond three can lead to a combinatorial explosion. If you are sure you want to search for larger complex regulons, use force=True

[8]:
compute_trn_enrichment(gene_set,all_genes,trn,max_regs=3,force=True).head()
[8]:
pvalue precision recall f1score TP regulon_size gene_set_size n_regs qvalue
arcA+ile-tRNA+lrp 0.0 1.0 1.0 1.0 3.0 3.0 3.0 3.0 0.0
arcA+ile-tRNA+thr-tRNA 0.0 1.0 1.0 1.0 3.0 3.0 3.0 3.0 0.0
arcA+lrp+thr-tRNA 0.0 1.0 1.0 1.0 3.0 3.0 3.0 3.0 0.0
arcA+rpoD+thr-tRNA 0.0 1.0 1.0 1.0 3.0 3.0 3.0 3.0 0.0
crp+fnr+ile-tRNA 0.0 1.0 1.0 1.0 3.0 3.0 3.0 3.0 0.0

4.2. Using the IcaData object

[9]:
# Compare a single iModulon to a simple regulon
ica_data.compute_regulon_enrichment('GlpR',regulator='glpR')
[9]:
pvalue           0
precision        1
recall           1
f1score          1
TP               9
regulon_size     9
imodulon_size    9
n_regs           1
Name: glpR, dtype: int64
[10]:
# Compare a single iModulon to a complex regulon
ica_data.compute_regulon_enrichment('Copper','cusR/cueR')
[10]:
pvalue           5.471346e-20
precision        1.000000e+00
recall           4.210526e-01
f1score          5.925926e-01
TP               8.000000e+00
regulon_size     1.900000e+01
imodulon_size    8.000000e+00
n_regs           2.000000e+00
Name: cusR/cueR, dtype: float64
[11]:
# Get all significant enrichments for all iModulons.
# Optional parameters are same as the global compute_trn_enrichment function
ica_data.compute_trn_enrichment()
[11]:
imodulon regulator pvalue qvalue precision recall f1score TP regulon_size imodulon_size n_regs
0 AllR/AraC/FucR araC 3.804040e-14 3.423636e-13 0.388889 0.500000 0.437500 7.0 14.0 18.0 1.0
1 AllR/AraC/FucR allR 3.067669e-13 1.380451e-12 0.333333 0.666667 0.444444 6.0 9.0 18.0 1.0
2 AllR/AraC/FucR fucR 2.316812e-11 6.950436e-11 0.277778 0.714286 0.400000 5.0 7.0 18.0 1.0
3 AllR/AraC/FucR rpoD 5.383749e-07 1.211343e-06 0.944444 0.011668 0.023051 17.0 1457.0 18.0 1.0
4 AllR/AraC/FucR crp 8.866740e-07 1.596013e-06 0.666667 0.020443 0.039669 12.0 587.0 18.0 1.0
... ... ... ... ... ... ... ... ... ... ... ...
181 nitrate-related narP 9.268494e-13 6.024521e-12 0.571429 0.163265 0.253968 8.0 49.0 14.0 1.0
182 nitrate-related nsrR 3.762939e-09 1.630607e-08 0.500000 0.086420 0.147368 7.0 81.0 14.0 1.0
183 nitrate-related fnr 8.576894e-09 2.787490e-08 0.785714 0.024943 0.048352 11.0 441.0 14.0 1.0
184 translation fnr 2.865424e-08 4.584679e-07 0.583333 0.031746 0.060215 14.0 441.0 24.0 1.0
185 translation arcA 4.244932e-07 3.395946e-06 0.541667 0.028634 0.054393 13.0 454.0 24.0 1.0

186 rows × 11 columns

4.2.1. Save to sample table

You can directly save iModulon enrichments to the sample table using either the ica_data.compute_regulon_enrichment function or ica_data.compute_trn_enrichment function. Columns unrelated to regulon enrichments (e.g. Function and Category) are left alone.

[12]:
ica_data.imodulon_table.loc[['GlpR']]
[12]:
regulator f1score pvalue precision recall TP n_genes n_tf Category threshold
GlpR glpR 1.0 0.0 1.0 1.0 9.0 9 1 Carbon Source Utilization 0.066801
[13]:
ica_data.compute_regulon_enrichment('GlpR',regulator='crp',save=True)
ica_data.imodulon_table.loc[['GlpR']]
[13]:
regulator f1score pvalue precision recall TP n_genes n_tf Category threshold regulon_size imodulon_size n_regs
GlpR crp 0.030201 3.567813e-08 1.0 0.015332 9.0 9 1 Carbon Source Utilization 0.066801 587.0 9.0 1.0
[14]:
# The enrichment with the lowest qvalue is saved as the official enrichment
ica_data.compute_trn_enrichment(save=True)
ica_data.imodulon_table
[14]:
regulator pvalue qvalue precision recall f1score TP regulon_size imodulon_size n_regs Category threshold n_tf n_genes
AllR/AraC/FucR araC 3.804040e-14 3.423636e-13 0.388889 0.500000 0.437500 7.0 14.0 18.0 1.0 Carbon Source Utilization 0.086996 3 18
ArcA-1 arcA 6.418412e-20 2.374813e-18 0.660000 0.072687 0.130952 33.0 454.0 50.0 1.0 Energy Metabolism 0.058051 1 50
ArcA-2 appY 7.019326e-18 1.895218e-16 0.320000 0.888889 0.470588 8.0 9.0 25.0 1.0 Energy Metabolism 0.081113 1 25
ArgR argR 6.026074e-18 7.833897e-17 0.923077 0.098361 0.177778 12.0 122.0 13.0 1.0 Amino Acid and Nucleotide Biosynthesis 0.080441 1 13
AtoC atoC 1.522277e-12 1.217822e-11 0.666667 1.000000 0.800000 4.0 4.0 6.0 1.0 Miscellaneous Metabolism 0.105756 1 6
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
uncharacterized-4 None NaN NaN NaN NaN NaN NaN NaN NaN NaN Uncharacterized 0.074349 0 10
uncharacterized-5 None NaN NaN NaN NaN NaN NaN NaN NaN NaN Uncharacterized 0.079179 0 4
uncharacterized-6 None NaN NaN NaN NaN NaN NaN NaN NaN NaN Uncharacterized 0.052578 0 54
ydcI-KO None NaN NaN NaN NaN NaN NaN NaN NaN NaN Genomic Alterations 0.099229 0 3
yheO-KO None NaN NaN NaN NaN NaN NaN NaN NaN NaN Genomic Alterations 0.078511 0 16

92 rows × 14 columns

[ ]: