Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA.
Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA.
Nat Biotechnol. 2024 May;42(5):768-777. doi: 10.1038/s41587-023-01868-8. Epub 2023 Jul 27.
Phylogenetic trees provide a framework for organizing evolutionary histories across the tree of life and aid downstream comparative analyses such as metagenomic identification. Methods that rely on single-marker genes such as 16S rRNA have produced trees of limited accuracy with hundreds of thousands of organisms, whereas methods that use genome-wide data are not scalable to large numbers of genomes. We introduce updating trees using divide-and-conquer (uDance), a method that enables updatable genome-wide inference using a divide-and-conquer strategy that refines different parts of the tree independently and can build off of existing trees, with high accuracy and scalability. With uDance, we infer a species tree of roughly 200,000 genomes using 387 marker genes, totaling 42.5 billion amino acid residues.
系统发生树为生命之树的进化历史提供了一个组织框架,并有助于下游的比较分析,如宏基因组鉴定。依赖于 16S rRNA 等单标记基因的方法在处理数十万种生物体时,产生的树的准确性有限,而使用全基因组数据的方法则无法扩展到大量基因组。我们引入了使用分而治之(uDance)更新树的方法,该方法使用分而治之的策略实现可更新的全基因组推断,该策略独立地细化树的不同部分,并可以利用现有树进行构建,具有高精度和可扩展性。使用 uDance,我们使用 387 个标记基因推断出大约 20 万个基因组的种系发生树,总计 425 亿个氨基酸残基。