Roback P, Beard J, Baumann D, Gille C, Henry K, Krohn S, Wiste H, Voskuil M I, Rainville C, Rutherford R
Department of Mathematics, Statistics and Computer Science, St. Olaf College, Northfield, MN 55057, USA.
Nucleic Acids Res. 2007;35(15):5085-95. doi: 10.1093/nar/gkm518. Epub 2007 Jul 25.
The prediction of operons in Mycobacterium tuberculosis (MTB) is a first step toward understanding the regulatory network of this pathogen. Here we apply a statistical model using logistic regression to predict operons in MTB. As predictors, our model incorporates intergenic distance and the correlation of gene expression calculated for adjacent gene pairs from over 474 microarray experiments with MTB RNA. We validate our findings with known examples from the literature and experimentation. From this model, we rank each potential operon pair by the strength of evidence for cotranscription, choose a classification threshold with a true positive rate of over 90% at a false positive rate of 9.1%, and use it to construct an operon map for the MTB genome.
预测结核分枝杆菌(MTB)中的操纵子是了解该病原体调控网络的第一步。在此,我们应用一种使用逻辑回归的统计模型来预测MTB中的操纵子。作为预测因子,我们的模型纳入了基因间距离以及从超过474个MTB RNA微阵列实验中计算出的相邻基因对的基因表达相关性。我们用文献和实验中的已知实例验证了我们的发现。基于这个模型,我们根据共转录证据的强度对每个潜在的操纵子对进行排名,选择一个在假阳性率为9.1%时真阳性率超过90%的分类阈值,并使用它构建MTB基因组的操纵子图谱。