Suppr超能文献

用定制的标记基因集增强微生物系统发育信号。

Augmenting microbial phylogenomic signal with tailored marker gene sets.

作者信息

Secaira-Morocho Henry, Jiang Xiaofang, Zhu Qiyun

机构信息

Center for Fundamental and Applied Microbiomics and School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA.

National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

bioRxiv. 2025 Mar 15:2025.03.13.643052. doi: 10.1101/2025.03.13.643052.

Abstract

Phylogenetic marker genes are traditionally selected from a fixed collection of whole genomes evenly distributed across major microbial phyla, covering only a small fraction of gene families. And yet, most microbial diversity is found in metagenome-assembled genomes that are unevenly distributed and harbor gene families that do not fit the criteria of universal orthologous genes. To address these limitations, we systematically evaluate the phylogenetic signal of gene families annotated from KEGG and EggNOG functional databases for deep microbial phylogenomics. We show that markers selected from an expanded pool of gene families and tailored to the input genomes improve the accuracy of phylogenetic trees across simulated and real-world datasets of whole genomes and metagenome-assembled genomes. The improved accuracy of trees compared to previous markers persists even when metagenome-assembled genomes lack a fraction of open reading frames. The selected markers have functional annotations related to metabolism, cellular processes, and environmental information processing, in addition to replication, translation, and transcription. We introduce TMarSel, a software tool for automated, systematic, free-from-expert opinion, and tailored marker selection that provides flexibility in the number of markers and annotation databases while remaining robust against uneven taxon sampling and incomplete genomic data.

摘要

系统发育标记基因传统上是从均匀分布于主要微生物门类的全基因组固定集合中选取的,仅涵盖一小部分基因家族。然而,大多数微生物多样性存在于宏基因组组装基因组中,这些基因组分布不均,且含有不符合通用直系同源基因标准的基因家族。为解决这些局限性,我们系统地评估了从KEGG和EggNOG功能数据库注释的基因家族的系统发育信号,用于深度微生物系统发育基因组学研究。我们表明,从扩展的基因家族库中选择并针对输入基因组进行定制的标记,可提高全基因组和宏基因组组装基因组的模拟及真实数据集的系统发育树准确性。即便宏基因组组装基因组缺少一部分开放阅读框,与先前标记相比,改进后的树的准确性依然存在。除了复制、翻译和转录外,所选标记还具有与代谢、细胞过程及环境信息处理相关的功能注释。我们推出了TMarSel,这是一款用于自动、系统、无需专家意见且定制化标记选择的软件工具,它在标记数量和注释数据库方面提供了灵活性,同时对不均衡的分类群抽样和不完整的基因组数据具有稳健性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/11952537/768c0591214c/nihpp-2025.03.13.643052v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验