Technical University of Denmark, Center for Systems Biology, Denmark.
BMC Genomics. 2012;13 Suppl 7(Suppl 7):S3. doi: 10.1186/1471-2164-13-S7-S3. Epub 2012 Dec 13.
The preferred habitat of a given bacterium can provide a hint of which types of enzymes of potential industrial interest it might produce. These might include enzymes that are stable and active at very high or very low temperatures. Being able to accurately predict this based on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments.
This study found a total of 40 protein families useful for distinction between three thermophilicity classes (thermophiles, mesophiles and psychrophiles). The predictive performance of these protein families were compared to those of 87 basic sequence features (relative use of amino acids and codons, genomic and 16S rDNA AT content and genome size). When using naïve Bayesian inference, it was possible to correctly predict the optimal temperature range with a Matthews correlation coefficient of up to 0.68. The best predictive performance was always achieved by including protein families as well as structural features, compared to either of these alone. A dedicated computer program was created to perform these predictions.
This study shows that protein families associated with specific thermophilicity classes can provide effective input data for thermophilicity prediction, and that the naïve Bayesian approach is effective for such a task. The program created for this study is able to efficiently distinguish between thermophilic, mesophilic and psychrophilic adapted bacterial genomes.
给定细菌的首选栖息地可以提供一些线索,说明它可能产生哪些具有潜在工业用途的酶。这些酶可能包括在非常高或非常低的温度下稳定且活跃的酶。能够基于基因组序列准确地预测这一点,将允许有效地针对生产生物进行有针对性的搜索,减少培养实验的需求。
本研究共发现了 40 种蛋白质家族,可用于区分三种嗜热性类别(嗜热菌、中温菌和嗜冷菌)。将这些蛋白质家族的预测性能与 87 种基本序列特征(氨基酸和密码子的相对使用、基因组和 16S rDNA 的 AT 含量以及基因组大小)进行了比较。当使用朴素贝叶斯推理时,使用马修斯相关系数(可达 0.68)来正确预测最佳温度范围是可能的。将蛋白质家族与结构特征一起使用始终可以实现最佳的预测性能,与仅使用其中之一相比。创建了一个专门的计算机程序来执行这些预测。
本研究表明,与特定嗜热性类别相关的蛋白质家族可以为嗜热性预测提供有效的输入数据,并且朴素贝叶斯方法非常适合此类任务。为这项研究创建的程序能够有效地区分嗜热、中温和嗜冷适应的细菌基因组。