使用 Read2Tree 从原始测序reads 直接推断系统发育树。

Inference of phylogenetic trees directly from raw sequencing reads using Read2Tree.

机构信息

Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.

SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.

出版信息

Nat Biotechnol. 2024 Jan;42(1):139-147. doi: 10.1038/s41587-023-01753-4. Epub 2023 Apr 20.

DOI:10.1038/s41587-023-01753-4

PMID:37081138

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10791578/

Abstract

Current methods for inference of phylogenetic trees require running complex pipelines at substantial computational and labor costs, with additional constraints in sequencing coverage, assembly and annotation quality, especially for large datasets. To overcome these challenges, we present Read2Tree, which directly processes raw sequencing reads into groups of corresponding genes and bypasses traditional steps in phylogeny inference, such as genome assembly, annotation and all-versus-all sequence comparisons, while retaining accuracy. In a benchmark encompassing a broad variety of datasets, Read2Tree is 10-100 times faster than assembly-based approaches and in most cases more accurate-the exception being when sequencing coverage is high and reference species very distant. Here, to illustrate the broad applicability of the tool, we reconstruct a yeast tree of life of 435 species spanning 590 million years of evolution. We also apply Read2Tree to >10,000 Coronaviridae samples, accurately classifying highly diverse animal samples and near-identical severe acute respiratory syndrome coronavirus 2 sequences on a single tree. The speed, accuracy and versatility of Read2Tree enable comparative genomics at scale.

摘要

目前推断系统发育树的方法需要在计算和劳动力成本方面运行复杂的管道，并且在测序覆盖率、组装和注释质量方面存在额外的限制，特别是对于大型数据集。为了克服这些挑战，我们提出了 Read2Tree，它直接将原始测序reads 处理成相应基因的组，绕过了系统发育推断中的传统步骤，例如基因组组装、注释和全对全序列比较，同时保持准确性。在一个包含广泛数据集的基准测试中，Read2Tree 比基于组装的方法快 10-100 倍，并且在大多数情况下更准确——例外情况是测序覆盖率高且参考物种非常远。在这里，为了说明该工具的广泛适用性，我们重建了一个跨越 5.9 亿年进化的 435 种酵母生命树。我们还将 Read2Tree 应用于超过 10000 个冠状病毒科样本，在单个树中准确地对高度多样化的动物样本和非常相似的严重急性呼吸综合征冠状病毒 2 序列进行分类。Read2Tree 的速度、准确性和多功能性使大规模的比较基因组学成为可能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be0b/10791578/3be45544cacd/41587_2023_1753_Fig1_HTML.jpg

相似文献

Inference of phylogenetic trees directly from raw sequencing reads using Read2Tree.

Nat Biotechnol. 2024 Jan;42(1):139-147. doi: 10.1038/s41587-023-01753-4. Epub 2023 Apr 20.

Read2Tree: scalable and accurate phylogenetic trees from raw reads.

bioRxiv. 2022 Dec 13:2022.04.18.488678. doi: 10.1101/2022.04.18.488678.

phyBWT2: phylogeny reconstruction via eBWT positional clustering.

Algorithms Mol Biol. 2023 Aug 3;18(1):11. doi: 10.1186/s13015-023-00232-4.

Phylogenomics from Whole Genome Sequences Using aTRAM.

Syst Biol. 2017 Sep 1;66(5):786-798. doi: 10.1093/sysbio/syw105.

Reference-free inference of tumor phylogenies from single-cell sequencing data.

BMC Genomics. 2015;16 Suppl 11(Suppl 11):S7. doi: 10.1186/1471-2164-16-S11-S7. Epub 2015 Nov 10.

Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES.

bioRxiv. 2024 Jun 1:2024.05.27.596098. doi: 10.1101/2024.05.27.596098.

OrthoFinder: phylogenetic orthology inference for comparative genomics.

Genome Biol. 2019 Nov 14;20(1):238. doi: 10.1186/s13059-019-1832-y.

Revisiting chloroplast genomic landscape and annotation towards comparative chloroplast genomes of Rhamnaceae.

BMC Plant Biol. 2023 Jan 28;23(1):59. doi: 10.1186/s12870-023-04074-5.

Computational methods for Gene Orthology inference.

Brief Bioinform. 2011 Sep;12(5):379-91. doi: 10.1093/bib/bbr030. Epub 2011 Jun 19.

Taxonium, a web-based tool for exploring large phylogenetic trees.

Elife. 2022 Nov 15;11:e82392. doi: 10.7554/eLife.82392.

引用本文的文献

Detection and characterization of neonatal cytomegalovirus through nanopore sequencing using flongle flow cells: Pilot study in Philadelphia, Pennsylvania.

J Virol Methods. 2025 Aug 24;339:115245. doi: 10.1016/j.jviromet.2025.115245.

Unraveling phylogenetic conflicts and adaptive evolution in Chiroptera using large-scale mitogenomes and nuclear genes.

Sci China Life Sci. 2025 Jul 18. doi: 10.1007/s11427-024-2847-5.

Poplar: a phylogenomics pipeline.

Bioinform Adv. 2025 May 6;5(1):vbaf104. doi: 10.1093/bioadv/vbaf104. eCollection 2025.

EvANI benchmarking workflow for evolutionary distance estimation.

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf267.

Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES.

Proc Natl Acad Sci U S A. 2025 May 13;122(19):e2500553122. doi: 10.1073/pnas.2500553122. Epub 2025 May 2.

Backbone phylogeny of based on genome skimming data.

Plant Divers. 2024 Sep 12;47(2):178-188. doi: 10.1016/j.pld.2024.09.004. eCollection 2025 Mar.

EvANI benchmarking workflow for evolutionary distance estimation.

bioRxiv. 2025 Feb 23:2025.02.23.639716. doi: 10.1101/2025.02.23.639716.

WASTER: Practical phylogenomics from low-coverage short reads.

bioRxiv. 2025 Jan 24:2025.01.20.633983. doi: 10.1101/2025.01.20.633983.

Orthology inference at scale with FastOMA.

Nat Methods. 2025 Feb;22(2):269-272. doi: 10.1038/s41592-024-02552-8. Epub 2025 Jan 3.

Subfamily evolution analysis using nuclear and chloroplast data from the same reads.

Sci Rep. 2025 Jan 3;15(1):687. doi: 10.1038/s41598-024-83292-9.

本文引用的文献

The complete sequence of a human genome.

Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.

Towards population-scale long-read sequencing.

Nat Rev Genet. 2021 Sep;22(9):572-587. doi: 10.1038/s41576-021-00367-3. Epub 2021 May 28.

The Need for a Human Pangenome Reference Sequence.

Annu Rev Genomics Hum Genet. 2021 Aug 31;22:81-102. doi: 10.1146/annurev-genom-120120-081921. Epub 2021 Apr 30.

Towards complete and error-free genome assemblies of all vertebrate species.

Nature. 2021 Apr;592(7856):737-746. doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.

Lack of support for Deuterostomia prompts reinterpretation of the first Bilateria.

Sci Adv. 2021 Mar 19;7(12). doi: 10.1126/sciadv.abe2741. Print 2021 Mar.

Systematic errors in orthology inference and their effects on evolutionary analyses.

iScience. 2021 Jan 28;24(2):102110. doi: 10.1016/j.isci.2021.102110. eCollection 2021 Feb 19.

Want to track pandemic variants faster? Fix the bioinformatics bottleneck.

Nature. 2021 Mar;591(7848):30-33. doi: 10.1038/d41586-021-00525-x.

Genome Sequence of a Strain from a Farmed Mink in The Netherlands.

Microbiol Resour Announc. 2021 Feb 25;10(8):e01451-20. doi: 10.1128/MRA.01451-20.

Twelve years of SAMtools and BCFtools.

Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.

Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm.

Nat Methods. 2021 Feb;18(2):170-175. doi: 10.1038/s41592-020-01056-5. Epub 2021 Feb 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用 Read2Tree 从原始测序reads 直接推断系统发育树。

Inference of phylogenetic trees directly from raw sequencing reads using Read2Tree.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献