6. iModulon Thresholds
iModulon thresholds are computed using an outlier test (See Methods section “Determination of the gene coefficient threshold” in Sastry et al 2019). This outlier test relies on the D’Agostino test statistic.
However, if you are working with an understudied organism, we recommend setting the thresholds using the K-means method when creating the :class:~pymodulon.core.IcaData
object.
[1]:
from pymodulon.example_data import load_ecoli_data
from pymodulon.core import IcaData
[2]:
ica_data = load_ecoli_data()
6.1. Setting the threshold method
You can choose between the D’Agostino or K-means method of thresholding when creating the :class:~pymodulon.core.IcaData
object.
[3]:
test_data = IcaData(ica_data.M,
ica_data.A,
trn=ica_data.trn,
threshold_method='dagostino')
test_data.thresholds['ArgR']
WARNING:root:Using the default dagostino_cutoff of 550. This may not be optimal for your dataset. Use ica_data.reoptimize_thresholds() to find the optimal threshold.
[3]:
0.0804412723
A higher D’Agostino cutoff leads to higher thresholds. The default is 550.
[4]:
test_data = IcaData(ica_data.M,
ica_data.A,
trn=ica_data.trn,
threshold_method='dagostino',
dagostino_cutoff=5000)
test_data.thresholds['ArgR']
[4]:
0.26879279035000003
To search for the optimal D’Agostino cutoff, set optimize_cutoff=True
. To demonstrate this, we created a mini IcaData object since optimizing the cutoff can take 2-3 minutes.
[5]:
test_data = IcaData(ica_data.M.iloc[:,:10],
ica_data.A.iloc[:10,:],
trn=ica_data.trn,
threshold_method='dagostino',
optimize_cutoff=True)
test_data.thresholds['ArgR']
WARNING:root:Optimizing iModulon thresholds, may take 2-3 minutes...
[5]:
0.0945705206
The K-means method is a non-parametric threshold estimation method and is the default if no TRN is supplied. Although the thresholds are fairly close to the D’Agostino method, the D’Agostino method produces the highest accuracy iModulons.
[6]:
test_data = IcaData(ica_data.M,
ica_data.A,
threshold_method='kmeans')
test_data.thresholds['ArgR']
[6]:
0.1375202039
6.2. Changing the iModulon thresholds
If the TRN file is updated, the D’Agostino test statistic can be re-optimized using the reoptimize_thresholds
function. To manually adjust this statistic, use the recompute_thresholds
function. To change a single iModulon threshold, use the change_threshold
function.
6.2.1. Re-optimizing the threshold
The reoptimize_threshold
function may take a few minutes to complete. Fortunately, this only needs to be performed once. You can turn off the progress bar or plot using the progress
and plot
arguments, respectively.
[7]:
# ica_data.reoptimize_thresholds()
6.2.2. Changing the D’agostino statistic
To test out different D’agostino test statistics, use the recompute_thresholds
function
[8]:
ica_data.view_imodulon('GlpR')
[8]:
gene_weight | start | end | strand | gene_name | length | operon | COG | accession | regulator | |
---|---|---|---|---|---|---|---|---|---|---|
b2239 | 0.211384 | 2349934 | 2351011 | - | glpQ | 1077 | glpTQ | Energy production and conversion | NC_000913.3 | crp,fis,fnr,glpR,ihf,nac,rpoD |
b2240 | 0.306134 | 2351015 | 2352374 | - | glpT | 1359 | glpTQ | Carbohydrate transport and metabolism | NC_000913.3 | crp,fis,fnr,glpR,ihf,nac,rpoD |
b2241 | 0.375662 | 2352646 | 2354275 | + | glpA | 1629 | glpABC | Energy production and conversion | NC_000913.3 | arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD |
b2242 | 0.328961 | 2354264 | 2355524 | + | glpB | 1260 | glpABC | Amino acid transport and metabolism | NC_000913.3 | arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD |
b2243 | 0.315752 | 2355520 | 2356711 | + | glpC | 1191 | glpABC | Energy production and conversion | NC_000913.3 | arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD |
b3426 | 0.350034 | 3562012 | 3563518 | + | glpD | 1506 | glpD | Energy production and conversion | NC_000913.3 | arcA,crp,glpR,rpoD,yieP |
b3925 | 0.084278 | 4114568 | 4115579 | - | glpX | 1011 | glpFKX | Carbohydrate transport and metabolism | NC_000913.3 | crp,glpR,rpoD |
b3926 | 0.290235 | 4115713 | 4117222 | - | glpK | 1509 | glpFKX | Energy production and conversion | NC_000913.3 | crp,glpR,rpoD |
b3927 | 0.312307 | 4117244 | 4118090 | - | glpF | 846 | glpFKX | Carbohydrate transport and metabolism | NC_000913.3 | crp,glpR,rpoD |
[9]:
ica_data.recompute_thresholds(300)
[10]:
ica_data.view_imodulon('GlpR')
[10]:
gene_weight | start | end | strand | gene_name | length | operon | COG | accession | regulator | |
---|---|---|---|---|---|---|---|---|---|---|
b1199 | 0.045686 | 1249124 | 1249757 | - | dhaL | 633 | dhaKLM | Carbohydrate transport and metabolism | NC_000913.3 | arcA,dhaR,fnr,lrp,rpoS |
b1200 | 0.045491 | 1249767 | 1250838 | - | dhaK | 1071 | dhaKLM | Carbohydrate transport and metabolism | NC_000913.3 | arcA,dhaR,fnr,lrp,rpoS |
b2239 | 0.211384 | 2349934 | 2351011 | - | glpQ | 1077 | glpTQ | Energy production and conversion | NC_000913.3 | crp,fis,fnr,glpR,ihf,nac,rpoD |
b2240 | 0.306134 | 2351015 | 2352374 | - | glpT | 1359 | glpTQ | Carbohydrate transport and metabolism | NC_000913.3 | crp,fis,fnr,glpR,ihf,nac,rpoD |
b2241 | 0.375662 | 2352646 | 2354275 | + | glpA | 1629 | glpABC | Energy production and conversion | NC_000913.3 | arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD |
b2242 | 0.328961 | 2354264 | 2355524 | + | glpB | 1260 | glpABC | Amino acid transport and metabolism | NC_000913.3 | arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD |
b2243 | 0.315752 | 2355520 | 2356711 | + | glpC | 1191 | glpABC | Energy production and conversion | NC_000913.3 | arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD |
b3011 | 0.049324 | 3155354 | 3156518 | + | yqhD | 1164 | yqhD-dkgA | Energy production and conversion | NC_000913.3 | rpoD,yqhC |
b3426 | 0.350034 | 3562012 | 3563518 | + | glpD | 1506 | glpD | Energy production and conversion | NC_000913.3 | arcA,crp,glpR,rpoD,yieP |
b3925 | 0.084278 | 4114568 | 4115579 | - | glpX | 1011 | glpFKX | Carbohydrate transport and metabolism | NC_000913.3 | crp,glpR,rpoD |
b3926 | 0.290235 | 4115713 | 4117222 | - | glpK | 1509 | glpFKX | Energy production and conversion | NC_000913.3 | crp,glpR,rpoD |
b3927 | 0.312307 | 4117244 | 4118090 | - | glpF | 846 | glpFKX | Carbohydrate transport and metabolism | NC_000913.3 | crp,glpR,rpoD |
6.2.3. Manually changing thresholds
To manually change a threshold, use the change_threshold
function.
[11]:
ica_data.change_threshold('GlpR',0.2)
[12]:
ica_data.view_imodulon('GlpR')
[12]:
gene_weight | start | end | strand | gene_name | length | operon | COG | accession | regulator | |
---|---|---|---|---|---|---|---|---|---|---|
b2239 | 0.211384 | 2349934 | 2351011 | - | glpQ | 1077 | glpTQ | Energy production and conversion | NC_000913.3 | crp,fis,fnr,glpR,ihf,nac,rpoD |
b2240 | 0.306134 | 2351015 | 2352374 | - | glpT | 1359 | glpTQ | Carbohydrate transport and metabolism | NC_000913.3 | crp,fis,fnr,glpR,ihf,nac,rpoD |
b2241 | 0.375662 | 2352646 | 2354275 | + | glpA | 1629 | glpABC | Energy production and conversion | NC_000913.3 | arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD |
b2242 | 0.328961 | 2354264 | 2355524 | + | glpB | 1260 | glpABC | Amino acid transport and metabolism | NC_000913.3 | arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD |
b2243 | 0.315752 | 2355520 | 2356711 | + | glpC | 1191 | glpABC | Energy production and conversion | NC_000913.3 | arcA,crp,fis,flhD;flhC,fnr,glpR,rpoD |
b3426 | 0.350034 | 3562012 | 3563518 | + | glpD | 1506 | glpD | Energy production and conversion | NC_000913.3 | arcA,crp,glpR,rpoD,yieP |
b3926 | 0.290235 | 4115713 | 4117222 | - | glpK | 1509 | glpFKX | Energy production and conversion | NC_000913.3 | crp,glpR,rpoD |
b3927 | 0.312307 | 4117244 | 4118090 | - | glpF | 846 | glpFKX | Carbohydrate transport and metabolism | NC_000913.3 | crp,glpR,rpoD |
[ ]: