1. Introduction to the `IcaData` object

The pymodulon.core.IcaData object is at the core of the PyModulon package. This object holds all of the data related to the expression dataset, the iModulons, and their annotations.

[1]:

from pymodulon.core import IcaData
from pymodulon import example_data
from pymodulon.io import save_to_json, load_json_model

1.1. Minimum requirements

The IcaData object only requires two matrices, which are the results of performing Independent Component Analysis (ICA) on an expression dataset. For more information about ICA, see the iModulonDB about page

M: The iModulon matrix contains the Independent Components (ICs) themselves. Each column represents an IC, and each row contains the gene weights for each gene across each IC.

[2]:

M = example_data.M
M.head()

[2]:

	AllR/AraC/FucR	ArcA-1	ArcA-2	ArgR	AtoC	BW25113	Cbl+CysB	CdaR	CecR	Copper	...	thrA-KO	translation	uncharacterized-1	uncharacterized-2	uncharacterized-3	uncharacterized-4	uncharacterized-5	uncharacterized-6	ydcI-KO	yheO-KO
b0002	-0.010888	-0.007717	-0.008502	-0.012186	-0.061489	-0.005599	-0.007377	-0.000795	0.004331	0.001845	...	0.479209	0.035685	0.024778	-0.010660	-0.002123	-0.004416	-0.005428	-0.009219	-0.004345	-0.007838
b0003	-0.011467	0.003042	0.011448	-0.003685	-0.006106	0.006680	-0.043512	0.005107	0.000474	0.007650	...	0.011420	0.040811	0.003324	-0.008424	-0.004415	-0.016126	-0.016476	-0.003497	-0.003583	0.003381
b0004	-0.008693	0.003944	0.012347	-0.008104	0.000585	0.003245	-0.041283	0.006390	0.004260	0.007109	...	0.011339	0.036244	0.003710	-0.005212	0.000700	-0.011096	-0.006140	-0.003155	-0.008418	0.000129
b0005	0.006565	-0.001099	0.009415	-0.008507	0.005399	0.014748	-0.009249	-0.003058	-0.012649	-0.002370	...	-0.015324	0.028972	0.023969	0.000150	0.018497	0.009428	0.001255	-0.006890	-0.028069	0.021534
b0006	-0.006011	0.009889	-0.005555	-0.000152	-0.002454	0.009678	-0.003456	0.002160	-0.001924	-0.000628	...	-0.005661	0.000700	-0.002538	-0.006103	-0.002506	-0.005077	-0.004616	-0.003585	0.001607	0.001285

5 rows × 92 columns

A: The Activity matrix contains the condition-specific activities. Each column represents a sample, and each row contains the activity of each iModulon across all samples.

[3]:

A = example_data.A
A.head()

[3]:

	control__wt_glc__1	control__wt_glc__2	fur__wt_dpd__1	fur__wt_dpd__2	fur__wt_fe__1	fur__wt_fe__2	fur__delfur_dpd__1	fur__delfur_dpd__2	fur__delfur_fe2__1	fur__delfur_fe2__2	...	efeU__menFentC_ale29__1	efeU__menFentC_ale29__2	efeU__menFentC_ale30__1	efeU__menFentC_ale30__2	efeU__menFentCubiC_ale36__1	efeU__menFentCubiC_ale36__2	efeU__menFentCubiC_ale37__1	efeU__menFentCubiC_ale37__2	efeU__menFentCubiC_ale38__1	efeU__menFentCubiC_ale38__2
AllR/AraC/FucR	0.378690	-0.378690	2.457678	2.248678	-0.327344	-0.259164	1.777251	2.690655	0.656937	0.319583	...	1.041336	2.203940	3.698292	0.856998	1.557323	0.337806	0.943742	1.736640	0.499461	1.581476
ArcA-1	-0.440210	0.440210	-5.367360	-5.684301	0.131174	0.348843	-4.436389	-4.770469	-1.799113	-1.474222	...	-6.471714	-6.549861	-3.109145	-2.716183	-2.531192	-1.461022	-0.408849	-0.210397	-5.700321	-6.237836
ArcA-2	0.762258	-0.762258	2.619623	2.900696	3.120724	2.743634	1.989803	1.555835	1.782500	1.530811	...	2.789653	3.959650	1.585147	0.811182	0.300414	2.537535	1.061408	2.634524	0.125513	1.178747
ArgR	-0.289630	0.289630	-10.085719	-13.187916	2.371129	1.861918	-8.708701	-7.881588	-1.237027	-1.235604	...	-11.263744	-10.366813	-0.289217	0.389228	-5.142768	-5.014526	-3.648777	-4.125952	-4.286326	-5.475940
AtoC	0.250770	-0.250770	1.844767	2.055052	0.299345	0.425502	1.801217	1.790987	0.921254	1.410026	...	3.821909	3.306573	2.652394	1.910173	0.927772	1.327549	1.846321	0.909667	2.064662	2.371405

5 rows × 278 columns

To create the IcaData object, the M and A datasets can be entered as either filenames or as a Pandas DataFrame

[4]:

ica_data = IcaData(M,A)
ica_data

[4]:

<pymodulon.core.IcaData at 0x7fc18620f9d0>

Once loaded, the M and A matrices can be accessed directly from the object

[5]:

ica_data.M.head()

[5]:

	AllR/AraC/FucR	ArcA-1	ArcA-2	ArgR	AtoC	BW25113	Cbl+CysB	CdaR	CecR	Copper	...	thrA-KO	translation	uncharacterized-1	uncharacterized-2	uncharacterized-3	uncharacterized-4	uncharacterized-5	uncharacterized-6	ydcI-KO	yheO-KO
b0002	-0.010888	-0.007717	-0.008502	-0.012186	-0.061489	-0.005599	-0.007377	-0.000795	0.004331	0.001845	...	0.479209	0.035685	0.024778	-0.010660	-0.002123	-0.004416	-0.005428	-0.009219	-0.004345	-0.007838
b0003	-0.011467	0.003042	0.011448	-0.003685	-0.006106	0.006680	-0.043512	0.005107	0.000474	0.007650	...	0.011420	0.040811	0.003324	-0.008424	-0.004415	-0.016126	-0.016476	-0.003497	-0.003583	0.003381
b0004	-0.008693	0.003944	0.012347	-0.008104	0.000585	0.003245	-0.041283	0.006390	0.004260	0.007109	...	0.011339	0.036244	0.003710	-0.005212	0.000700	-0.011096	-0.006140	-0.003155	-0.008418	0.000129
b0005	0.006565	-0.001099	0.009415	-0.008507	0.005399	0.014748	-0.009249	-0.003058	-0.012649	-0.002370	...	-0.015324	0.028972	0.023969	0.000150	0.018497	0.009428	0.001255	-0.006890	-0.028069	0.021534
b0006	-0.006011	0.009889	-0.005555	-0.000152	-0.002454	0.009678	-0.003456	0.002160	-0.001924	-0.000628	...	-0.005661	0.000700	-0.002538	-0.006103	-0.002506	-0.005077	-0.004616	-0.003585	0.001607	0.001285

5 rows × 92 columns

[6]:

ica_data.A.head()

[6]:

	control__wt_glc__1	control__wt_glc__2	fur__wt_dpd__1	fur__wt_dpd__2	fur__wt_fe__1	fur__wt_fe__2	fur__delfur_dpd__1	fur__delfur_dpd__2	fur__delfur_fe2__1	fur__delfur_fe2__2	...	efeU__menFentC_ale29__1	efeU__menFentC_ale29__2	efeU__menFentC_ale30__1	efeU__menFentC_ale30__2	efeU__menFentCubiC_ale36__1	efeU__menFentCubiC_ale36__2	efeU__menFentCubiC_ale37__1	efeU__menFentCubiC_ale37__2	efeU__menFentCubiC_ale38__1	efeU__menFentCubiC_ale38__2
AllR/AraC/FucR	0.378690	-0.378690	2.457678	2.248678	-0.327344	-0.259164	1.777251	2.690655	0.656937	0.319583	...	1.041336	2.203940	3.698292	0.856998	1.557323	0.337806	0.943742	1.736640	0.499461	1.581476
ArcA-1	-0.440210	0.440210	-5.367360	-5.684301	0.131174	0.348843	-4.436389	-4.770469	-1.799113	-1.474222	...	-6.471714	-6.549861	-3.109145	-2.716183	-2.531192	-1.461022	-0.408849	-0.210397	-5.700321	-6.237836
ArcA-2	0.762258	-0.762258	2.619623	2.900696	3.120724	2.743634	1.989803	1.555835	1.782500	1.530811	...	2.789653	3.959650	1.585147	0.811182	0.300414	2.537535	1.061408	2.634524	0.125513	1.178747
ArgR	-0.289630	0.289630	-10.085719	-13.187916	2.371129	1.861918	-8.708701	-7.881588	-1.237027	-1.235604	...	-11.263744	-10.366813	-0.289217	0.389228	-5.142768	-5.014526	-3.648777	-4.125952	-4.286326	-5.475940
AtoC	0.250770	-0.250770	1.844767	2.055052	0.299345	0.425502	1.801217	1.790987	0.921254	1.410026	...	3.821909	3.306573	2.652394	1.910173	0.927772	1.327549	1.846321	0.909667	2.064662	2.371405

5 rows × 278 columns

If the M and A datasets have row or column names, these will be saved as the sample/gene/iModulon names. Since genes are often re-named when characterized, the locus tag is the preferred identifier.

[7]:

print('Gene names:',ica_data.gene_names[:5])
print('Sample names:',ica_data.sample_names[:5])
print('iModulon names:',ica_data.imodulon_names[:5])

Gene names: ['b0002', 'b0003', 'b0004', 'b0005', 'b0006']
Sample names: ['control__wt_glc__1', 'control__wt_glc__2', 'fur__wt_dpd__1', 'fur__wt_dpd__2', 'fur__wt_fe__1']
iModulon names: ['AllR/AraC/FucR', 'ArcA-1', 'ArcA-2', 'ArgR', 'AtoC']

1.2. Adding the Expression Matrix

The X matrix contains eXpression data and is primarily used for plotting functions. The column names of the X matrix are the sample names, and the row names are the gene identifiers.

[8]:

X = example_data.X
X.head()

[8]:

	control__wt_glc__1	control__wt_glc__2	fur__wt_dpd__1	fur__wt_dpd__2	fur__wt_fe__1	fur__wt_fe__2	fur__delfur_dpd__1	fur__delfur_dpd__2	fur__delfur_fe2__1	fur__delfur_fe2__2	...	efeU__menFentC_ale29__1	efeU__menFentC_ale29__2	efeU__menFentC_ale30__1	efeU__menFentC_ale30__2	efeU__menFentCubiC_ale36__1	efeU__menFentCubiC_ale36__2	efeU__menFentCubiC_ale37__1	efeU__menFentCubiC_ale37__2	efeU__menFentCubiC_ale38__1	efeU__menFentCubiC_ale38__2
b0002	-0.061772	0.061772	0.636527	0.819793	-0.003615	-0.289353	-1.092023	-0.777289	0.161343	0.145641	...	-0.797097	-0.791859	0.080114	0.102154	0.608180	0.657673	0.813105	0.854813	0.427986	0.484338
b0003	-0.053742	0.053742	0.954439	1.334385	0.307588	0.128414	-0.872563	-0.277893	0.428542	0.391761	...	-0.309105	-0.352535	-0.155074	-0.077145	0.447030	0.439881	0.554528	0.569030	0.154905	0.294799
b0004	-0.065095	0.065095	-0.202697	0.119195	-0.264995	-0.546017	-1.918349	-1.577736	-0.474815	-0.495312	...	-0.184898	-0.225615	0.019575	0.063986	0.483343	0.452754	0.524828	0.581878	0.293239	0.341040
b0005	0.028802	-0.028802	-0.865171	-0.951179	0.428769	0.123564	-1.660351	-1.531147	0.240353	-0.151132	...	-0.308221	-0.581714	0.018820	0.004040	-1.228763	-1.451750	-0.839203	-0.529349	-0.413336	-0.478682
b0006	0.009087	-0.009087	-0.131039	-0.124079	-0.144870	-0.090152	-0.219917	-0.046648	-0.044537	-0.089204	...	1.464603	1.415706	1.230831	1.165153	0.447447	0.458852	0.421417	0.408077	1.151066	1.198529

5 rows × 278 columns

[9]:

ica_data.X = X

1.3. Adding annotation tables

You may load in additional data tables with information about your samples, genes, or iModulons.

These tables are originally empty, but can be altered like any Pandas DataFrame.

[10]:

ica_data.gene_table.head()

[10]:


b0002
b0003
b0004
b0005
b0006

Annotation tables contain one sample/gene/iModulon per row, and information about the respective item in columns. For example, a gene_table may include the gene function, genomic position, or Cluster of Orthologous Groups (COG) Category. See the Creating the Gene Table tutorial for a step-by-step example on how to contruct this table. Gene names must match the gene names in the M matrix.

[11]:

gene_table = example_data.gene_table
gene_table.head()

[11]:

	start	end	strand	gene_name	length	operon	COG	accession
b0001	189	255	+	thrL	66	thrLABC	No COG Annotation	NC_000913.3
b0002	336	2799	+	thrA	2463	thrLABC	Amino acid transport and metabolism	NC_000913.3
b0003	2800	3733	+	thrB	933	thrLABC	Amino acid transport and metabolism	NC_000913.3
b0004	3733	5020	+	thrC	1287	thrLABC	Amino acid transport and metabolism	NC_000913.3
b0005	5233	5530	+	yaaX	297	yaaX	Function unknown	NC_000913.3

The sample_table contains detailed experimental metadata about each sample. This must be manually created, and can contain information related to the strains or experimental conditions used in the study.

[12]:

sample_table = example_data.sample_table
sample_table.head()

[12]:

	Study	project	condition	Replicate #	Strain Description	Strain	Base Media	Carbon Source (g/L)	Nitrogen Source (g/L)	Electron Acceptor	...	Growth Rate (1/hr)	Evolved Sample	Isolate Type	Sequencing Machine	ALEdb sample	Additional Details	Biological Replicates	Alignment	DOI	GEO
Sample ID
control__wt_glc__1	Control	control	wt_glc	1	Escherichia coli K-12 MG1655	MG1655	M9	glucose(2)	NH4Cl(1)	O2	...	NaN	No	NaN	MiSeq	NaN	NaN	2	94.33	doi.org/10.1101/080929	GSE65643
control__wt_glc__2	Control	control	wt_glc	2	Escherichia coli K-12 MG1655	MG1655	M9	glucose(2)	NH4Cl(1)	O2	...	NaN	No	NaN	MiSeq	NaN	NaN	2	94.24	doi.org/10.1101/080929	GSE65643
fur__wt_dpd__1	Fur	fur	wt_dpd	1	Escherichia coli K-12 MG1655	MG1655	M9	glucose(2)	NH4Cl(1)	O2	...	0.00	No	NaN	MiSeq	NaN	NaN	2	98.04	doi.org/10.1038/ncomms5910	GSE54900
fur__wt_dpd__2	Fur	fur	wt_dpd	2	Escherichia coli K-12 MG1655	MG1655	M9	glucose(2)	NH4Cl(1)	O2	...	0.00	No	NaN	MiSeq	NaN	NaN	2	98.30	doi.org/10.1038/ncomms5910	GSE54900
fur__wt_fe__1	Fur	fur	wt_fe	1	Escherichia coli K-12 MG1655	MG1655	M9	glucose(2)	NH4Cl(1)	O2	...	1.06	No	NaN	MiSeq	NaN	NaN	2	93.35	doi.org/10.1038/ncomms5910	GSE54900

5 rows × 26 columns

The project and condition columns in the sample_table will be useful for the plotting functions described in the Plotting Functions tutorial.

The imodulon_table contains information about each iModulon, such as regulator enrichments or iModulon size.

[13]:

imodulon_table = example_data.imodulon_table
imodulon_table.head()

[13]:

	regulator	f1score	pvalue	precision	recall	TP	n_genes	n_tf	Category	threshold
name
AllR/AraC/FucR	allR/araC/fucR	0.750000	1.190000e-41	1.000000	0.600000	18.0	18	3	Carbon Source Utilization	0.086996
ArcA-1	arcA	0.130952	6.420000e-20	0.660000	0.072687	33.0	50	1	Energy Metabolism	0.058051
ArcA-2	arcA	0.087683	1.150000e-16	0.840000	0.046256	21.0	25	1	Energy Metabolism	0.081113
ArgR	argR	0.177778	6.030000e-18	0.923077	0.098361	12.0	13	1	Amino Acid and Nucleotide Biosynthesis	0.080441
AtoC	atoC	0.800000	1.520000e-12	0.666667	1.000000	4.0	6	1	Miscellaneous Metabolism	0.105756

The tables can be loaded into the IcaData object as either filenames or as a Pandas DataFrame

[14]:

ica_data.gene_table = gene_table
ica_data.sample_table = sample_table
ica_data.imodulon_table = imodulon_table

[15]:

ica_data.sample_table.head()

[15]:

	Study	project	condition	Replicate #	Strain Description	Strain	Base Media	Carbon Source (g/L)	Nitrogen Source (g/L)	Electron Acceptor	...	Growth Rate (1/hr)	Evolved Sample	Isolate Type	Sequencing Machine	ALEdb sample	Additional Details	Biological Replicates	Alignment	DOI	GEO
control__wt_glc__1	Control	control	wt_glc	1	Escherichia coli K-12 MG1655	MG1655	M9	glucose(2)	NH4Cl(1)	O2	...	NaN	No	NaN	MiSeq	NaN	NaN	2	94.33	doi.org/10.1101/080929	GSE65643
control__wt_glc__2	Control	control	wt_glc	2	Escherichia coli K-12 MG1655	MG1655	M9	glucose(2)	NH4Cl(1)	O2	...	NaN	No	NaN	MiSeq	NaN	NaN	2	94.24	doi.org/10.1101/080929	GSE65643
fur__wt_dpd__1	Fur	fur	wt_dpd	1	Escherichia coli K-12 MG1655	MG1655	M9	glucose(2)	NH4Cl(1)	O2	...	0.00	No	NaN	MiSeq	NaN	NaN	2	98.04	doi.org/10.1038/ncomms5910	GSE54900
fur__wt_dpd__2	Fur	fur	wt_dpd	2	Escherichia coli K-12 MG1655	MG1655	M9	glucose(2)	NH4Cl(1)	O2	...	0.00	No	NaN	MiSeq	NaN	NaN	2	98.30	doi.org/10.1038/ncomms5910	GSE54900
fur__wt_fe__1	Fur	fur	wt_fe	1	Escherichia coli K-12 MG1655	MG1655	M9	glucose(2)	NH4Cl(1)	O2	...	1.06	No	NaN	MiSeq	NaN	NaN	2	93.35	doi.org/10.1038/ncomms5910	GSE54900

5 rows × 26 columns

[16]:

ica_data.gene_table.head()

[16]:

	start	end	strand	gene_name	length	operon	COG	accession
b0002	336	2799	+	thrA	2463	thrLABC	Amino acid transport and metabolism	NC_000913.3
b0003	2800	3733	+	thrB	933	thrLABC	Amino acid transport and metabolism	NC_000913.3
b0004	3733	5020	+	thrC	1287	thrLABC	Amino acid transport and metabolism	NC_000913.3
b0005	5233	5530	+	yaaX	297	yaaX	Function unknown	NC_000913.3
b0006	5682	6459	-	yaaA	777	yaaA	Function unknown	NC_000913.3

[17]:

ica_data.imodulon_table.head()

[17]:

	regulator	f1score	pvalue	precision	recall	TP	n_genes	n_tf	Category	threshold
AllR/AraC/FucR	allR/araC/fucR	0.750000	1.190000e-41	1.000000	0.600000	18.0	18	3	Carbon Source Utilization	0.086996
ArcA-1	arcA	0.130952	6.420000e-20	0.660000	0.072687	33.0	50	1	Energy Metabolism	0.058051
ArcA-2	arcA	0.087683	1.150000e-16	0.840000	0.046256	21.0	25	1	Energy Metabolism	0.081113
ArgR	argR	0.177778	6.030000e-18	0.923077	0.098361	12.0	13	1	Amino Acid and Nucleotide Biosynthesis	0.080441
AtoC	atoC	0.800000	1.520000e-12	0.666667	1.000000	4.0	6	1	Miscellaneous Metabolism	0.105756

1.4. Converting between gene names and locus tags

If the gene_table contains a gene_name columns, the name2num and num2name methods can convert between locus tags and gene names.

[18]:

ica_data.num2name('b0002')

[18]:

'thrA'

[19]:

ica_data.name2num('thrA')

[19]:

'b0002'

1.5. Adding the TRN

Adding the transcriptional regulatory network (TRN) to the IcaData object enables automated calculation of regulon enrichments. Each row of the TRN file represents a regulatory interaction. The TRN must contain the following columns:

regulator: Name of the regulator (/ or + characters will be converted to ;)
gene_id: Locus tag of the target gene (must be in ica_data.gene_names)

The following columns are optional, but are helpful to have:

regulator_id - Locus tag of regulator
gene_name - Name of gene (can automatically update this using name2num)
direction - Direction of regulation (+ for activation, - for repression, ? or NaN for unknown)
evidence - Evidence of regulation (e.g. ChIP-exo, qRT-PCR, SELEX, Motif search)
PMID - Reference for regulatory interaction

[20]:

trn = example_data.trn
trn.head()

[20]:

	regulator	gene_id	effect
0	FMN	b3041	-
1	L-tryptophan	b3708	+
2	L-tryptophan	b3709	+
3	TPP	b0066	-
4	TPP	b0067	-

Again, this table can be passed in as either a filename or a Pandas DataFrame.

[21]:

ica_data.trn = trn
ica_data.trn.head()

[21]:

	regulator	gene_id	effect
0	FMN	b3041	-
1	L-tryptophan	b3708	+
2	L-tryptophan	b3709	+
3	TPP	b0066	-
4	TPP	b0067	-

1.6. Inspecting iModulons

view_imodulon shows the information about each gene in the iModulon. Most information is retrieved from the gene_table, but the regulator column comes from the trn.

[22]:

ica_data.view_imodulon('GlpR')

[22]:

	gene_weight	start	end	strand	gene_name	length	operon	COG	accession	regulator
b2239	0.211384	2349934	2351011	-	glpQ	1077	glpTQ	Energy production and conversion	NC_000913.3	crp,fis,fnr,glpR,ihf,nac,rpoD
b2240	0.306134	2351015	2352374	-	glpT	1359	glpTQ	Carbohydrate transport and metabolism	NC_000913.3	crp,fis,fnr,glpR,ihf,nac,rpoD
b2241	0.375662	2352646	2354275	+	glpA	1629	glpABC	Energy production and conversion	NC_000913.3	arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD
b2242	0.328961	2354264	2355524	+	glpB	1260	glpABC	Amino acid transport and metabolism	NC_000913.3	arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD
b2243	0.315752	2355520	2356711	+	glpC	1191	glpABC	Energy production and conversion	NC_000913.3	arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD
b3426	0.350034	3562012	3563518	+	glpD	1506	glpD	Energy production and conversion	NC_000913.3	arcA,crp,glpR,rpoD,yieP
b3926	0.290235	4115713	4117222	-	glpK	1509	glpFKX	Energy production and conversion	NC_000913.3	crp,glpR,rpoD
b3927	0.312307	4117244	4118090	-	glpF	846	glpFKX	Carbohydrate transport and metabolism	NC_000913.3	crp,glpR,rpoD

1.7. Searching for genes in iModulons

To find which iModulons contain a specific gene, use the imodulons_with method.

[23]:

ica_data.imodulons_with('b2239')

[23]:

['GlpR']

If the gene_table contains a gene_name columns, this function will work with either the locus tag or the gene name.

[24]:

ica_data.imodulons_with('carA')

[24]:

['PurR-2']

1.8. Renaming iModulons

Individual iModulons can be renamed using the rename_imodulons method

[25]:

print('Original iModulon Names:', ica_data.imodulon_names[:5])
ica_data.rename_imodulons({'AllR/AraC/FucR':'AllR'})
print('Renamed iModulon Names:', ica_data.imodulon_names[:5])

Original iModulon Names: ['AllR/AraC/FucR', 'ArcA-1', 'ArcA-2', 'ArgR', 'AtoC']
Renamed iModulon Names: ['AllR', 'ArcA-1', 'ArcA-2', 'ArgR', 'AtoC']

These changes are reflected throughout the IcaData object.

[26]:

print('M matrix columns:', ica_data.M.columns[:5])

M matrix columns: Index(['AllR', 'ArcA-1', 'ArcA-2', 'ArgR', 'AtoC'], dtype='object')

iModulon names can be updated all at once as well.

[27]:

print('Original iModulon Names:', ica_data.imodulon_names[:5])

new_names = ['AllR/AraC/FucR']+ica_data.imodulon_names[1:]

print('New iModulon names:', new_names[:5])

ica_data.imodulon_names = new_names

print('Renamed iModulon Names:', ica_data.imodulon_names[:5])

Original iModulon Names: ['AllR', 'ArcA-1', 'ArcA-2', 'ArgR', 'AtoC']
New iModulon names: ['AllR/AraC/FucR', 'ArcA-1', 'ArcA-2', 'ArgR', 'AtoC']
Renamed iModulon Names: ['AllR/AraC/FucR', 'ArcA-1', 'ArcA-2', 'ArgR', 'AtoC']

1.9. Copying `IcaData` objects

The copy method creates a new IcaData object identical to the old one.

[28]:

ica_data.copy()

[28]:

<pymodulon.core.IcaData at 0x7fc186195650>

1.10. Saving and Loading `IcaData` Objects

To facilitate data sharing, you can save IcaData objects as json files that can be easily re-loaded

[29]:

from pymodulon.io import *
from os import path

[30]:

filepath = path.join('tmp','ecoli_data.json')
save_to_json(ica_data,filepath)

[31]:

ica_data = load_json_model(filepath)

[32]:

ica_data.imodulon_table.head()

[32]:

	regulator	f1score	pvalue	precision	recall	TP	n_genes	n_tf	Category	threshold
AllR/AraC/FucR	allR/araC/fucR	0.750000	1.190000e-41	1.000000	0.600000	18.0	18	3	Carbon Source Utilization	0.086996
ArcA-1	arcA	0.130952	6.420000e-20	0.660000	0.072687	33.0	50	1	Energy Metabolism	0.058051
ArcA-2	arcA	0.087683	1.150000e-16	0.840000	0.046256	21.0	25	1	Energy Metabolism	0.081113
ArgR	argR	0.177778	6.030000e-18	0.923077	0.098361	12.0	13	1	Amino Acid and Nucleotide Biosynthesis	0.080441
AtoC	atoC	0.800000	1.520000e-12	0.666667	1.000000	4.0	6	1	Miscellaneous Metabolism	0.105756

[ ]:

1. Introduction to the IcaData object

1.1. Minimum requirements

1.2. Adding the Expression Matrix

1.3. Adding annotation tables

1.4. Converting between gene names and locus tags

1.5. Adding the TRN

1.6. Inspecting iModulons

1.7. Searching for genes in iModulons

1.8. Renaming iModulons

1.9. Copying IcaData objects

1.10. Saving and Loading IcaData Objects

1. Introduction to the `IcaData` object

1.9. Copying `IcaData` objects

1.10. Saving and Loading `IcaData` Objects