Suppr超能文献

环境和分类学塑造了原核极端微生物的基因组特征。

Environment and taxonomy shape the genomic signature of prokaryotic extremophiles.

机构信息

School of Computer Science, University of Waterloo, Waterloo, ON, Canada.

Department of Biology, University of Western Ontario, London, ON, Canada.

出版信息

Sci Rep. 2023 Sep 26;13(1):16105. doi: 10.1038/s41598-023-42518-y.

Abstract

This study provides comprehensive quantitative evidence suggesting that adaptations to extreme temperatures and pH imprint a discernible environmental component in the genomic signature of microbial extremophiles. Both supervised and unsupervised machine learning algorithms were used to analyze genomic signatures, each computed as the k-mer frequency vector of a 500 kbp DNA fragment arbitrarily selected to represent a genome. Computational experiments classified/clustered genomic signatures extracted from a curated dataset of [Formula: see text] extremophile (temperature, pH) bacteria and archaea genomes, at multiple scales of analysis, [Formula: see text]. The supervised learning resulted in high accuracies for taxonomic classifications at [Formula: see text], and medium to medium-high accuracies for environment category classifications of the same datasets at [Formula: see text]. For [Formula: see text], our findings were largely consistent with amino acid compositional biases and codon usage patterns in coding regions, previously attributed to extreme environment adaptations. The unsupervised learning of unlabelled sequences identified several exemplars of hyperthermophilic organisms with large similarities in their genomic signatures, in spite of belonging to different domains in the Tree of Life.

摘要

这项研究提供了全面的定量证据,表明极端温度和 pH 值的适应在微生物极端生物的基因组特征中留下了明显的环境成分。本研究使用监督和无监督机器学习算法来分析基因组特征,每个特征都计算为任意选择代表基因组的 500 kbp DNA 片段的 k-mer 频率向量。在多个分析尺度上,对来自 [Formula: see text] 极端(温度、pH)细菌和古菌基因组的经过策展的数据集提取的基因组特征进行了计算实验分类/聚类。监督学习在 [Formula: see text] 时实现了对分类学分类的高准确性,在 [Formula: see text] 时实现了对同一数据集的环境类别分类的中等至高准确性。对于 [Formula: see text],我们的发现与先前归因于极端环境适应的编码区域中的氨基酸组成偏倚和密码子使用模式基本一致。对未标记序列的无监督学习确定了几个高温生物体的范例,它们的基因组特征非常相似,尽管它们属于生命之树的不同领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c04f/10522608/d9984134c897/41598_2023_42518_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验