Tran Thao T, Dam Phuongan, Su Zhengchang, Poole Farris L, Adams Michael W W, Zhou G Tong, Xu Ying
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Nucleic Acids Res. 2007;35(1):11-20. doi: 10.1093/nar/gkl974. Epub 2006 Dec 5.
Identification of operons in the hyperthermophilic archaeon Pyrococcus furiosus represents an important step to understanding the regulatory mechanisms that enable the organism to adapt and thrive in extreme environments. We have predicted operons in P.furiosus by combining the results from three existing algorithms using a neural network (NN). These algorithms use intergenic distances, phylogenetic profiles, functional categories and gene-order conservation in their operon prediction. Our method takes as inputs the confidence scores of the three programs, and outputs a prediction of whether adjacent genes on the same strand belong to the same operon. In addition, we have applied Gene Ontology (GO) and KEGG pathway information to improve the accuracy of our algorithm. The parameters of this NN predictor are trained on a subset of all experimentally verified operon gene pairs of Bacillus subtilis. It subsequently achieved 86.5% prediction accuracy when applied to a subset of gene pairs for Escherichia coli, which is substantially better than any of the three prediction programs. Using this new algorithm, we predicted 470 operons in the P.furiosus genome. Of these, 349 were validated using DNA microarray data.
鉴定嗜热古菌激烈火球菌中的操纵子是理解使该生物体能够在极端环境中适应和繁衍的调控机制的重要一步。我们通过使用神经网络(NN)结合三种现有算法的结果,预测了激烈火球菌中的操纵子。这些算法在操纵子预测中使用基因间距离、系统发育谱、功能类别和基因顺序保守性。我们的方法将这三个程序的置信度得分作为输入,并输出对同一条链上相邻基因是否属于同一操纵子的预测。此外,我们应用了基因本体论(GO)和KEGG通路信息来提高算法的准确性。这个NN预测器的参数是在枯草芽孢杆菌所有经实验验证的操纵子基因对的一个子集上进行训练的。随后,当应用于大肠杆菌基因对的一个子集时,它实现了86.5%的预测准确率,这比三个预测程序中的任何一个都要好得多。使用这种新算法,我们在激烈火球菌基因组中预测了470个操纵子。其中,349个通过DNA微阵列数据得到了验证。