• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用定制的标记基因集增强微生物系统发育信号。

Augmenting microbial phylogenomic signal with tailored marker gene sets.

作者信息

Secaira-Morocho Henry, Jiang Xiaofang, Zhu Qiyun

机构信息

Center for Fundamental and Applied Microbiomics and School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA.

National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

bioRxiv. 2025 Mar 15:2025.03.13.643052. doi: 10.1101/2025.03.13.643052.

DOI:10.1101/2025.03.13.643052
PMID:40161675
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11952537/
Abstract

Phylogenetic marker genes are traditionally selected from a fixed collection of whole genomes evenly distributed across major microbial phyla, covering only a small fraction of gene families. And yet, most microbial diversity is found in metagenome-assembled genomes that are unevenly distributed and harbor gene families that do not fit the criteria of universal orthologous genes. To address these limitations, we systematically evaluate the phylogenetic signal of gene families annotated from KEGG and EggNOG functional databases for deep microbial phylogenomics. We show that markers selected from an expanded pool of gene families and tailored to the input genomes improve the accuracy of phylogenetic trees across simulated and real-world datasets of whole genomes and metagenome-assembled genomes. The improved accuracy of trees compared to previous markers persists even when metagenome-assembled genomes lack a fraction of open reading frames. The selected markers have functional annotations related to metabolism, cellular processes, and environmental information processing, in addition to replication, translation, and transcription. We introduce TMarSel, a software tool for automated, systematic, free-from-expert opinion, and tailored marker selection that provides flexibility in the number of markers and annotation databases while remaining robust against uneven taxon sampling and incomplete genomic data.

摘要

系统发育标记基因传统上是从均匀分布于主要微生物门类的全基因组固定集合中选取的,仅涵盖一小部分基因家族。然而,大多数微生物多样性存在于宏基因组组装基因组中,这些基因组分布不均,且含有不符合通用直系同源基因标准的基因家族。为解决这些局限性,我们系统地评估了从KEGG和EggNOG功能数据库注释的基因家族的系统发育信号,用于深度微生物系统发育基因组学研究。我们表明,从扩展的基因家族库中选择并针对输入基因组进行定制的标记,可提高全基因组和宏基因组组装基因组的模拟及真实数据集的系统发育树准确性。即便宏基因组组装基因组缺少一部分开放阅读框,与先前标记相比,改进后的树的准确性依然存在。除了复制、翻译和转录外,所选标记还具有与代谢、细胞过程及环境信息处理相关的功能注释。我们推出了TMarSel,这是一款用于自动、系统、无需专家意见且定制化标记选择的软件工具,它在标记数量和注释数据库方面提供了灵活性,同时对不均衡的分类群抽样和不完整的基因组数据具有稳健性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/11952537/78431c44ff45/nihpp-2025.03.13.643052v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/11952537/768c0591214c/nihpp-2025.03.13.643052v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/11952537/87ca89999625/nihpp-2025.03.13.643052v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/11952537/ea5230d77088/nihpp-2025.03.13.643052v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/11952537/78431c44ff45/nihpp-2025.03.13.643052v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/11952537/768c0591214c/nihpp-2025.03.13.643052v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/11952537/87ca89999625/nihpp-2025.03.13.643052v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/11952537/ea5230d77088/nihpp-2025.03.13.643052v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9002/11952537/78431c44ff45/nihpp-2025.03.13.643052v1-f0004.jpg

相似文献

1
Augmenting microbial phylogenomic signal with tailored marker gene sets.用定制的标记基因集增强微生物系统发育信号。
bioRxiv. 2025 Mar 15:2025.03.13.643052. doi: 10.1101/2025.03.13.643052.
2
Comprehensive Functional Annotation of Metagenomes and Microbial Genomes Using a Deep Learning-Based Method.基于深度学习的宏基因组和微生物组综合功能注释。
mSystems. 2023 Apr 27;8(2):e0117822. doi: 10.1128/msystems.01178-22. Epub 2023 Mar 7.
3
KEMET - A python tool for KEGG Module evaluation and microbial genome annotation expansion.KEMET——一种用于KEGG模块评估和微生物基因组注释扩展的Python工具。
Comput Struct Biotechnol J. 2022 Mar 26;20:1481-1486. doi: 10.1016/j.csbj.2022.03.015. eCollection 2022.
4
Accurate Annotation of Microbial Metagenomic Genes and Identification of Core Sets.微生物宏基因组基因的准确注释和核心集的鉴定。
Methods Mol Biol. 2021;2242:115-138. doi: 10.1007/978-1-0716-1099-2_8.
5
ezTree: an automated pipeline for identifying phylogenetic marker genes and inferring evolutionary relationships among uncultivated prokaryotic draft genomes.ezTree:一种自动化的生物标记基因鉴定和未培养原核草案基因组进化关系推断的流水线。
BMC Genomics. 2018 Jan 19;19(Suppl 1):921. doi: 10.1186/s12864-017-4327-9.
6
eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale.eggNOG-mapper v2:宏基因组尺度的功能注释、直系同源物分配和结构域预测。
Mol Biol Evol. 2021 Dec 9;38(12):5825-5829. doi: 10.1093/molbev/msab293.
7
Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification.Bakta:通过无比对序列鉴定实现细菌基因组的快速标准化注释。
Microb Genom. 2021 Nov;7(11). doi: 10.1099/mgen.0.000685.
8
binny: an automated binning algorithm to recover high-quality genomes from complex metagenomic datasets.binny:一种自动化的分箱算法,可从复杂的宏基因组数据集中恢复高质量的基因组。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac431.
9
Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life.近 8000 个宏基因组组装基因组的恢复极大地扩展了生命之树。
Nat Microbiol. 2017 Nov;2(11):1533-1542. doi: 10.1038/s41564-017-0012-7. Epub 2017 Sep 11.
10
CAMITAX: Taxon labels for microbial genomes.CAMITAX:微生物基因组的分类标签。
Gigascience. 2020 Jan 1;9(1). doi: 10.1093/gigascience/giz154.

本文引用的文献

1
A metagenomic perspective on the microbial prokaryotic genome census.关于微生物原核生物基因组普查的宏基因组学视角。
Sci Adv. 2025 Jan 17;11(3):eadq2166. doi: 10.1126/sciadv.adq2166.
2
A phylogenetic approach to comparative genomics.一种用于比较基因组学的系统发育方法。
Nat Rev Genet. 2025 Jun;26(6):395-405. doi: 10.1038/s41576-024-00803-0. Epub 2025 Jan 8.
3
KEGG: biological systems database as a model of the real world.京都基因与基因组百科全书(KEGG):作为现实世界模型的生物系统数据库。
Nucleic Acids Res. 2025 Jan 6;53(D1):D672-D677. doi: 10.1093/nar/gkae909.
4
The nature of the last universal common ancestor and its impact on the early Earth system.最后一个普遍共同祖先的性质及其对早期地球系统的影响。
Nat Ecol Evol. 2024 Sep;8(9):1654-1666. doi: 10.1038/s41559-024-02461-1. Epub 2024 Jul 12.
5
AleRax: a tool for gene and species tree co-estimation and reconciliation under a probabilistic model of gene duplication, transfer, and loss.AleRax:一种在基因复制、转移和丢失的概率模型下,进行基因和物种树共同估计和协调的工具。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae162.
6
Update on the proposed minimal standards for the use of genome data for the taxonomy of prokaryotes.关于使用基因组数据对原核生物进行分类的最低标准建议的最新进展。
Int J Syst Evol Microbiol. 2024 Mar;74(3). doi: 10.1099/ijsem.0.006300.
7
A timeline of bacterial and archaeal diversification in the ocean.海洋中细菌和古菌多样化的时间线。
Elife. 2023 Dec 7;12:RP88268. doi: 10.7554/eLife.88268.
8
Unexpected absence of ribosomal protein genes from metagenome-assembled genomes.宏基因组组装基因组中核糖体蛋白基因意外缺失。
ISME Commun. 2022 Nov 28;2(1):118. doi: 10.1038/s43705-022-00204-6.
9
VBCG: 20 validated bacterial core genes for phylogenomic analysis with high fidelity and resolution.VBCG:用于系统发育基因组分析的 20 个经验证的细菌核心基因,具有高保真度和分辨率。
Microbiome. 2023 Nov 8;11(1):247. doi: 10.1186/s40168-023-01705-9.
10
Unraveling the functional dark matter through global metagenomics.通过全球宏基因组学揭示功能暗物质。
Nature. 2023 Oct;622(7983):594-602. doi: 10.1038/s41586-023-06583-7. Epub 2023 Oct 11.