Jensen Dan B, Ussery David W
Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark.
Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark; Comparative Genomics Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA.
F1000Res. 2013 Sep 13;2:184. doi: 10.12688/f1000research.2-184.v1. eCollection 2013.
Prediction of the optimal habitat conditions for a given bacterium, based on genome sequence alone would be of value for scientific as well as industrial purposes. One example of such a habitat adaptation is the requirement for oxygen. In spite of good genome data availability, there have been only a few prediction attempts of bacterial oxygen requirements, using genome sequences. Here, we describe a method for distinguishing aerobic, anaerobic and facultative anaerobic bacteria, based on genome sequence-derived input, using naive Bayesian inference. In contrast, other studies found in literature only demonstrate the ability to distinguish two classes at a time.
The results shown in the present study are as good as or better than comparable methods previously described in the scientific literature, with an arguably simpler method, when results are directly compared. This method further compares the performance of a single-step naive Bayesian prediction of the three included classifications, compared to a simple Bayesian network with two steps. A two-step network, distinguishing first respiring from non-respiring organisms, followed by the distinction of aerobe and facultative anaerobe organisms within the respiring group, is found to perform best.
A simple naive Bayesian network based on the presence or absence of specific protein domains within a genome is an effective and easy way to predict bacterial habitat preferences, such as oxygen requirement.
仅基于基因组序列预测特定细菌的最佳栖息地条件,对于科学研究和工业应用都具有重要价值。这种栖息地适应性的一个例子是对氧气的需求。尽管有丰富的基因组数据,但利用基因组序列对细菌需氧情况进行预测的尝试却很少。在此,我们描述了一种基于基因组序列输入,使用朴素贝叶斯推理来区分需氧菌、厌氧菌和兼性厌氧菌的方法。相比之下,文献中其他研究仅展示了一次区分两类细菌的能力。
当直接比较结果时,本研究中所示的结果与科学文献中先前描述的可比方法一样好或更好,且方法 arguably 更简单。该方法还比较了对三种分类进行单步朴素贝叶斯预测与两步简单贝叶斯网络的性能。发现一个两步网络表现最佳,该网络首先区分进行呼吸作用的生物和不进行呼吸作用的生物,然后在进行呼吸作用的生物群体中区分需氧菌和兼性厌氧菌。
基于基因组中特定蛋白质结构域的存在与否构建的简单朴素贝叶斯网络,是预测细菌栖息地偏好(如需氧情况)的有效且简便的方法。