Department of Cell Biology, and The Helen L. and Martin S. Kimmel Center for Biology and Medicine, Skirball Institute of Biomolecular Medicine, New York University School of Medicine, New York, New York, USA.
Bioinformatics. 2019 Sep 15;35(18):3224-3231. doi: 10.1093/bioinformatics/btz059.
Optimal growth temperature is a fundamental characteristic of all living organisms. Knowledge of this temperature is central to the study of a prokaryote, the thermal stability and temperature dependent activity of its genes, and the bioprospecting of its genome for thermally adapted proteins. While high throughput sequencing methods have dramatically increased the availability of genomic information, the growth temperatures of the source organisms are often unknown. This limits the study and technological application of these species and their genomes. Here, we present a novel method for the prediction of growth temperatures of prokaryotes using only genomic sequences.
By applying the reverse ecology principle that an organism's genome includes identifiable adaptations to its native environment, we can predict a species' optimal growth temperature with an accuracy of 5.17°C root-mean-square error and a coefficient of determination of 0.835. The accuracy can be further improved for specific taxonomic clades or by excluding psychrophiles. This method provides a valuable tool for the rapid calculation of organism growth temperature when only the genome sequence is known.
Source code, genomes analyzed and features calculated are available at: https://github.com/DavidBSauer/OGT_prediction.
Supplementary data are available at Bioinformatics online.
最适生长温度是所有生物的基本特征。了解这一温度是研究原核生物的核心,它的基因的热稳定性和温度依赖性活性,以及对其基因组进行耐热适应蛋白的生物勘探。虽然高通量测序方法极大地增加了基因组信息的可用性,但来源生物体的生长温度通常是未知的。这限制了对这些物种及其基因组的研究和技术应用。在这里,我们提出了一种仅使用基因组序列预测原核生物生长温度的新方法。
通过应用反向生态学原理,即生物体的基因组包括对其天然环境的可识别适应,我们可以以 5.17°C 的均方根误差和 0.835 的决定系数准确预测物种的最适生长温度。对于特定的分类群或排除嗜冷菌,可以进一步提高准确性。当仅知道基因组序列时,该方法为快速计算生物体生长温度提供了一个有价值的工具。
源代码、分析的基因组和计算的特征可在 https://github.com/DavidBSauer/OGT_prediction 上获得。
补充数据可在生物信息学在线获得。