4. Gene enrichment analysis

[1]:

from pymodulon.enrichment import *
from pymodulon.example_data import load_ecoli_data, trn
ica_data = load_ecoli_data()

4.1. General functions

To perform a basic enrichment test between two gene sets, use the compute_enrichment function. Optional arguments:

label: Label for your target gene set (e.g. regulator name or GO term)

[2]:

gene_set = ['b0002','b0003','b0004'] # e.g. an iModulon or other gene set
target_genes = ['b0002','b0005','b0007'] # e.g. a regulon or genes in a COG category
all_genes = ica_data.gene_names # List of all genes in your expression dataset

[3]:

compute_enrichment(gene_set,target_genes,all_genes)

[3]:

pvalue             0.002293
precision          0.333333
recall             0.333333
f1score            0.333333
TP                 1.000000
target_set_size    3.000000
gene_set_size      3.000000
dtype: float64

To perform an enrichment test against a regulon in a TRN table, use the compute_regulon function.

[4]:

compute_regulon_enrichment(gene_set,'lrp',all_genes,trn)

[4]:

pvalue             0.000131
precision          1.000000
recall             0.015000
f1score            0.029557
TP                 3.000000
regulon_size     200.000000
gene_set_size      3.000000
n_regs             1.000000
Name: lrp, dtype: float64

Each row of the TRN table represents a regulatory interaction. The table requires the columns regulator and gene_id.

[5]:

trn.head()

[5]:

	regulator	gene_id	effect
0	FMN	b3041	-
1	L-tryptophan	b3708	+
2	L-tryptophan	b3709	+
3	TPP	b0066	-
4	TPP	b0067	-

To perform an enrichment against all regulons in a TRN table, use the compute_trn_enrichment function. This function includes false detection correction using the Benjamini-Hochberg procedure to compute Q-values. By default, the false detection rate is 1e-5, and can be updated using the fdr argument.

[6]:

compute_trn_enrichment(gene_set,all_genes,trn)

[6]:

	pvalue	precision	recall	f1score	TP	regulon_size	gene_set_size	n_regs	qvalue
thr-tRNA	0.000000e+00	1.0	1.000000	1.000000	3.0	3.0	3.0	1.0	0.000000e+00
ile-tRNA	8.354256e-09	1.0	0.333333	0.500000	3.0	9.0	3.0	1.0	2.923990e-08
lrp	1.306248e-04	1.0	0.015000	0.029557	3.0	200.0	3.0	1.0	3.047911e-04
arcA	1.540883e-03	1.0	0.006608	0.013129	3.0	454.0	3.0	1.0	2.157237e-03
fnr	1.411995e-03	1.0	0.006803	0.013514	3.0	441.0	3.0	1.0	2.157237e-03
crp	3.335561e-03	1.0	0.005111	0.010169	3.0	587.0	3.0	1.0	3.891487e-03

compute_trn_enrichment can perform enrichment against complex regulons in boolean combinations. The max_regs argument determines the number of regulators to include in the complex regulon, and the method argument determines how to combine complex regulons. method='or' computes enrichments against the union of regulons, 'and' computes enrichment against intersection of regulons, and 'both' performs both tests (default).

[7]:

compute_trn_enrichment(gene_set,all_genes,trn,max_regs=2).head()

[7]:

	precision	recall	f1score	TP	regulon_size	gene_set_size	n_regs
fnr+ile-tRNA	1.0	1.0	1.0	3.0	3.0	3.0	2.0
thr-tRNA	1.0	1.0	1.0	3.0	3.0	3.0	1.0
arcA+ile-tRNA	1.0	1.0	1.0	3.0	3.0	3.0	2.0
lrp+thr-tRNA	1.0	1.0	1.0	3.0	3.0	3.0	2.0
arcA+thr-tRNA	1.0	1.0	1.0	3.0	3.0	3.0	2.0

Increasing the number of regulators for complex regulons beyond three can lead to a combinatorial explosion. If you are sure you want to search for larger complex regulons, use force=True

[8]:

compute_trn_enrichment(gene_set,all_genes,trn,max_regs=3,force=True).head()

[8]:

	precision	recall	f1score	TP	regulon_size	gene_set_size	n_regs
arcA+ile-tRNA+lrp	1.0	1.0	1.0	3.0	3.0	3.0	3.0
arcA+ile-tRNA+thr-tRNA	1.0	1.0	1.0	3.0	3.0	3.0	3.0
arcA+lrp+thr-tRNA	1.0	1.0	1.0	3.0	3.0	3.0	3.0
arcA+rpoD+thr-tRNA	1.0	1.0	1.0	3.0	3.0	3.0	3.0
crp+fnr+ile-tRNA	1.0	1.0	1.0	3.0	3.0	3.0	3.0

4.2. Using the `IcaData` object

[9]:

# Compare a single iModulon to a simple regulon
ica_data.compute_regulon_enrichment('GlpR',regulator='glpR')

[9]:

pvalue           0
precision        1
recall           1
f1score          1
TP               9
regulon_size     9
imodulon_size    9
n_regs           1
Name: glpR, dtype: int64

[10]:

# Compare a single iModulon to a complex regulon
ica_data.compute_regulon_enrichment('Copper','cusR/cueR')

[10]:

pvalue           5.471346e-20
precision        1.000000e+00
recall           4.210526e-01
f1score          5.925926e-01
TP               8.000000e+00
regulon_size     1.900000e+01
imodulon_size    8.000000e+00
n_regs           2.000000e+00
Name: cusR/cueR, dtype: float64

[11]:

# Get all significant enrichments for all iModulons.
# Optional parameters are same as the global compute_trn_enrichment function
ica_data.compute_trn_enrichment()

[11]:

	imodulon	regulator	pvalue	qvalue	precision	recall	f1score	TP	regulon_size	imodulon_size	n_regs
0	AllR/AraC/FucR	araC	3.804040e-14	3.423636e-13	0.388889	0.500000	0.437500	7.0	14.0	18.0	1.0
1	AllR/AraC/FucR	allR	3.067669e-13	1.380451e-12	0.333333	0.666667	0.444444	6.0	9.0	18.0	1.0
2	AllR/AraC/FucR	fucR	2.316812e-11	6.950436e-11	0.277778	0.714286	0.400000	5.0	7.0	18.0	1.0
3	AllR/AraC/FucR	rpoD	5.383749e-07	1.211343e-06	0.944444	0.011668	0.023051	17.0	1457.0	18.0	1.0
4	AllR/AraC/FucR	crp	8.866740e-07	1.596013e-06	0.666667	0.020443	0.039669	12.0	587.0	18.0	1.0
...	...	...	...	...	...	...	...	...	...	...	...
181	nitrate-related	narP	9.268494e-13	6.024521e-12	0.571429	0.163265	0.253968	8.0	49.0	14.0	1.0
182	nitrate-related	nsrR	3.762939e-09	1.630607e-08	0.500000	0.086420	0.147368	7.0	81.0	14.0	1.0
183	nitrate-related	fnr	8.576894e-09	2.787490e-08	0.785714	0.024943	0.048352	11.0	441.0	14.0	1.0
184	translation	fnr	2.865424e-08	4.584679e-07	0.583333	0.031746	0.060215	14.0	441.0	24.0	1.0
185	translation	arcA	4.244932e-07	3.395946e-06	0.541667	0.028634	0.054393	13.0	454.0	24.0	1.0

186 rows × 11 columns

4.2.1. Save to sample table

You can directly save iModulon enrichments to the sample table using either the ica_data.compute_regulon_enrichment function or ica_data.compute_trn_enrichment function. Columns unrelated to regulon enrichments (e.g. Function and Category) are left alone.

[12]:

ica_data.imodulon_table.loc[['GlpR']]

[12]:

	regulator	f1score	pvalue	precision	recall	TP	n_genes	n_tf	Category	threshold
GlpR	glpR	1.0	0.0	1.0	1.0	9.0	9	1	Carbon Source Utilization	0.066801

[13]:

ica_data.compute_regulon_enrichment('GlpR',regulator='crp',save=True)
ica_data.imodulon_table.loc[['GlpR']]

[13]:

	regulator	f1score	pvalue	precision	recall	TP	n_genes	n_tf	Category	threshold	regulon_size	imodulon_size	n_regs
GlpR	crp	0.030201	3.567813e-08	1.0	0.015332	9.0	9	1	Carbon Source Utilization	0.066801	587.0	9.0	1.0

[14]:

# The enrichment with the lowest qvalue is saved as the official enrichment
ica_data.compute_trn_enrichment(save=True)
ica_data.imodulon_table

[14]:

	regulator	pvalue	qvalue	precision	recall	f1score	TP	regulon_size	imodulon_size	n_regs	Category	threshold	n_tf	n_genes
AllR/AraC/FucR	araC	3.804040e-14	3.423636e-13	0.388889	0.500000	0.437500	7.0	14.0	18.0	1.0	Carbon Source Utilization	0.086996	3	18
ArcA-1	arcA	6.418412e-20	2.374813e-18	0.660000	0.072687	0.130952	33.0	454.0	50.0	1.0	Energy Metabolism	0.058051	1	50
ArcA-2	appY	7.019326e-18	1.895218e-16	0.320000	0.888889	0.470588	8.0	9.0	25.0	1.0	Energy Metabolism	0.081113	1	25
ArgR	argR	6.026074e-18	7.833897e-17	0.923077	0.098361	0.177778	12.0	122.0	13.0	1.0	Amino Acid and Nucleotide Biosynthesis	0.080441	1	13
AtoC	atoC	1.522277e-12	1.217822e-11	0.666667	1.000000	0.800000	4.0	4.0	6.0	1.0	Miscellaneous Metabolism	0.105756	1	6
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
uncharacterized-4	None	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Uncharacterized	0.074349	0	10
uncharacterized-5	None	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Uncharacterized	0.079179	0	4
uncharacterized-6	None	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Uncharacterized	0.052578	0	54
ydcI-KO	None	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Genomic Alterations	0.099229	0	3
yheO-KO	None	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Genomic Alterations	0.078511	0	16

92 rows × 14 columns

[ ]:

4. Gene enrichment analysis

4.1. General functions

4.2. Using the IcaData object

4.2.1. Save to sample table

4.2. Using the `IcaData` object