Suppr超能文献

使用ROADIES从原始基因组组装中准确、可扩展且完全自动化地推断物种树。

Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES.

作者信息

Gupta Anshu, Mirarab Siavash, Turakhia Yatish

机构信息

Department of Computer Science and Engineering, University of California, San Diego, CA 92093.

Department of Electrical and Computer Engineering, University of California, San Diego, CA 92093.

出版信息

Proc Natl Acad Sci U S A. 2025 May 13;122(19):e2500553122. doi: 10.1073/pnas.2500553122. Epub 2025 May 2.

Abstract

Current genome sequencing initiatives across a wide range of life forms offer significant potential to enhance our understanding of evolutionary relationships and support transformative biological and medical applications. Species trees play a central role in many of these applications; however, despite the widespread availability of genome assemblies, accurate inference of species trees remains challenging due to the limited automation, substantial domain expertise, and computational resources required by conventional methods. To address this limitation, we present ROADIES, a fully automated pipeline to infer species trees starting from raw genome assemblies. In contrast to the prominent approach, ROADIES incorporates a unique strategy of randomly sampling segments of the input genomes to generate gene trees. This eliminates the need for predefining a set of loci, limiting the analyses to a fixed number of genes, and performing the cumbersome gene annotation and/or whole genome alignment steps. ROADIES also eliminates the need to infer orthology by leveraging existing discordance-aware methods that allow multicopy genes. Using the genomic datasets from large-scale sequencing efforts across four diverse life forms (placental mammals, pomace flies, birds, and budding yeasts), we show that ROADIES infers species trees that are comparable in quality to the state-of-the-art studies but in a fraction of the time and effort, including on challenging datasets with rampant gene tree discordance and complex polyploidy. With its speed, accuracy, and automation, ROADIES has the potential to vastly simplify species tree inference, making it accessible to a broader range of scientists and applications.

摘要

当前针对广泛生命形式开展的基因组测序计划,为增进我们对进化关系的理解以及支持变革性的生物学和医学应用提供了巨大潜力。物种树在许多此类应用中发挥着核心作用;然而,尽管基因组组装已广泛可得,但由于传统方法所需的自动化程度有限、大量领域专业知识以及计算资源,准确推断物种树仍然具有挑战性。为解决这一限制,我们提出了ROADIES,这是一种从原始基因组组装开始推断物种树的全自动流程。与突出的方法不同,ROADIES采用了一种独特策略,即对输入基因组的片段进行随机抽样以生成基因树。这消除了预定义一组基因座的需要,将分析限制在固定数量的基因上,以及执行繁琐的基因注释和/或全基因组比对步骤。ROADIES还通过利用允许多拷贝基因的现有不一致感知方法,消除了推断直系同源性的需要。使用来自四种不同生命形式(胎盘哺乳动物、果蝇、鸟类和芽殖酵母)的大规模测序工作的基因组数据集,我们表明ROADIES推断的物种树在质量上与最先进的研究相当,但所需时间和精力仅为其一小部分,包括在具有大量基因树不一致和复杂多倍体的具有挑战性的数据集上。凭借其速度、准确性和自动化,ROADIES有潜力极大地简化物种树推断,使其能够被更广泛的科学家和应用所使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74d3/12088440/a05c5fee1579/pnas.2500553122fig01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验