6. iModulon Thresholds

iModulon thresholds are computed using an outlier test (See Methods section “Determination of the gene coefficient threshold” in Sastry et al 2019). This outlier test relies on the D’Agostino test statistic.

However, if you are working with an understudied organism, we recommend setting the thresholds using the K-means method when creating the :class:~pymodulon.core.IcaData object.

[1]:
from pymodulon.example_data import load_ecoli_data
from pymodulon.core import IcaData
[2]:
ica_data = load_ecoli_data()

6.1. Setting the threshold method

You can choose between the D’Agostino or K-means method of thresholding when creating the :class:~pymodulon.core.IcaData object.

[3]:
test_data = IcaData(ica_data.M,
                    ica_data.A,
                    trn=ica_data.trn,
                    threshold_method='dagostino')

test_data.thresholds['ArgR']
WARNING:root:Using the default dagostino_cutoff of 550. This may not be optimal for your dataset. Use ica_data.reoptimize_thresholds() to find the optimal threshold.
[3]:
0.0804412723

A higher D’Agostino cutoff leads to higher thresholds. The default is 550.

[4]:
test_data = IcaData(ica_data.M,
                    ica_data.A,
                    trn=ica_data.trn,
                    threshold_method='dagostino',
                    dagostino_cutoff=5000)

test_data.thresholds['ArgR']
[4]:
0.26879279035000003

To search for the optimal D’Agostino cutoff, set optimize_cutoff=True. To demonstrate this, we created a mini IcaData object since optimizing the cutoff can take 2-3 minutes.

[5]:
test_data = IcaData(ica_data.M.iloc[:,:10],
                    ica_data.A.iloc[:10,:],
                    trn=ica_data.trn,
                    threshold_method='dagostino',
                    optimize_cutoff=True)

test_data.thresholds['ArgR']
WARNING:root:Optimizing iModulon thresholds, may take 2-3 minutes...
[5]:
0.0945705206

The K-means method is a non-parametric threshold estimation method and is the default if no TRN is supplied. Although the thresholds are fairly close to the D’Agostino method, the D’Agostino method produces the highest accuracy iModulons.

[6]:
test_data = IcaData(ica_data.M,
                    ica_data.A,
                    threshold_method='kmeans')

test_data.thresholds['ArgR']
[6]:
0.1375202039

6.2. Changing the iModulon thresholds

If the TRN file is updated, the D’Agostino test statistic can be re-optimized using the reoptimize_thresholds function. To manually adjust this statistic, use the recompute_thresholds function. To change a single iModulon threshold, use the change_threshold function.

6.2.1. Re-optimizing the threshold

The reoptimize_threshold function may take a few minutes to complete. Fortunately, this only needs to be performed once. You can turn off the progress bar or plot using the progress and plot arguments, respectively.

[7]:
# ica_data.reoptimize_thresholds()

6.2.2. Changing the D’agostino statistic

To test out different D’agostino test statistics, use the recompute_thresholds function

[8]:
ica_data.view_imodulon('GlpR')
[8]:
gene_weight start end strand gene_name length operon COG accession regulator
b2239 0.211384 2349934 2351011 - glpQ 1077 glpTQ Energy production and conversion NC_000913.3 crp,fis,fnr,glpR,ihf,nac,rpoD
b2240 0.306134 2351015 2352374 - glpT 1359 glpTQ Carbohydrate transport and metabolism NC_000913.3 crp,fis,fnr,glpR,ihf,nac,rpoD
b2241 0.375662 2352646 2354275 + glpA 1629 glpABC Energy production and conversion NC_000913.3 arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD
b2242 0.328961 2354264 2355524 + glpB 1260 glpABC Amino acid transport and metabolism NC_000913.3 arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD
b2243 0.315752 2355520 2356711 + glpC 1191 glpABC Energy production and conversion NC_000913.3 arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD
b3426 0.350034 3562012 3563518 + glpD 1506 glpD Energy production and conversion NC_000913.3 arcA,crp,glpR,rpoD,yieP
b3925 0.084278 4114568 4115579 - glpX 1011 glpFKX Carbohydrate transport and metabolism NC_000913.3 crp,glpR,rpoD
b3926 0.290235 4115713 4117222 - glpK 1509 glpFKX Energy production and conversion NC_000913.3 crp,glpR,rpoD
b3927 0.312307 4117244 4118090 - glpF 846 glpFKX Carbohydrate transport and metabolism NC_000913.3 crp,glpR,rpoD
[9]:
ica_data.recompute_thresholds(300)
[10]:
ica_data.view_imodulon('GlpR')
[10]:
gene_weight start end strand gene_name length operon COG accession regulator
b1199 0.045686 1249124 1249757 - dhaL 633 dhaKLM Carbohydrate transport and metabolism NC_000913.3 arcA,dhaR,fnr,lrp,rpoS
b1200 0.045491 1249767 1250838 - dhaK 1071 dhaKLM Carbohydrate transport and metabolism NC_000913.3 arcA,dhaR,fnr,lrp,rpoS
b2239 0.211384 2349934 2351011 - glpQ 1077 glpTQ Energy production and conversion NC_000913.3 crp,fis,fnr,glpR,ihf,nac,rpoD
b2240 0.306134 2351015 2352374 - glpT 1359 glpTQ Carbohydrate transport and metabolism NC_000913.3 crp,fis,fnr,glpR,ihf,nac,rpoD
b2241 0.375662 2352646 2354275 + glpA 1629 glpABC Energy production and conversion NC_000913.3 arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD
b2242 0.328961 2354264 2355524 + glpB 1260 glpABC Amino acid transport and metabolism NC_000913.3 arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD
b2243 0.315752 2355520 2356711 + glpC 1191 glpABC Energy production and conversion NC_000913.3 arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD
b3011 0.049324 3155354 3156518 + yqhD 1164 yqhD-dkgA Energy production and conversion NC_000913.3 rpoD,yqhC
b3426 0.350034 3562012 3563518 + glpD 1506 glpD Energy production and conversion NC_000913.3 arcA,crp,glpR,rpoD,yieP
b3925 0.084278 4114568 4115579 - glpX 1011 glpFKX Carbohydrate transport and metabolism NC_000913.3 crp,glpR,rpoD
b3926 0.290235 4115713 4117222 - glpK 1509 glpFKX Energy production and conversion NC_000913.3 crp,glpR,rpoD
b3927 0.312307 4117244 4118090 - glpF 846 glpFKX Carbohydrate transport and metabolism NC_000913.3 crp,glpR,rpoD

6.2.3. Manually changing thresholds

To manually change a threshold, use the change_threshold function.

[11]:
ica_data.change_threshold('GlpR',0.2)
[12]:
ica_data.view_imodulon('GlpR')
[12]:
gene_weight start end strand gene_name length operon COG accession regulator
b2239 0.211384 2349934 2351011 - glpQ 1077 glpTQ Energy production and conversion NC_000913.3 crp,fis,fnr,glpR,ihf,nac,rpoD
b2240 0.306134 2351015 2352374 - glpT 1359 glpTQ Carbohydrate transport and metabolism NC_000913.3 crp,fis,fnr,glpR,ihf,nac,rpoD
b2241 0.375662 2352646 2354275 + glpA 1629 glpABC Energy production and conversion NC_000913.3 arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD
b2242 0.328961 2354264 2355524 + glpB 1260 glpABC Amino acid transport and metabolism NC_000913.3 arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD
b2243 0.315752 2355520 2356711 + glpC 1191 glpABC Energy production and conversion NC_000913.3 arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD
b3426 0.350034 3562012 3563518 + glpD 1506 glpD Energy production and conversion NC_000913.3 arcA,crp,glpR,rpoD,yieP
b3926 0.290235 4115713 4117222 - glpK 1509 glpFKX Energy production and conversion NC_000913.3 crp,glpR,rpoD
b3927 0.312307 4117244 4118090 - glpF 846 glpFKX Carbohydrate transport and metabolism NC_000913.3 crp,glpR,rpoD
[ ]: