EPA-ng：大规模并行遗传序列布局进化。

EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences.

机构信息

Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany.

Department of Computer Engineering, University of A Coruña, 15071 A Coruña, Spain.

出版信息

Syst Biol. 2019 Mar 1;68(2):365-369. doi: 10.1093/sysbio/syy054.

DOI:10.1093/sysbio/syy054

PMID:30165689

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6368480/

Abstract

Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the evolutionary placement algorithm (EPA) included in RAxML, or PPLACER, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Herein, we present EPA-NG, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA and PPLACER. EPA-NG can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA-NG, we placed $1$ billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3748 taxa in just under $7$ h, using 2048 cores. Our performance assessment shows that EPA-NG outperforms RAxML-EPA and PPLACER by up to a factor of $30$ in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA-NG scales well up to 2048 cores. EPA-NG is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng.

摘要

下一代测序（NGS）技术已经导致分子序列数据的普及。这种数据雪崩在侧重于从各种微生物环境中获得的序列的分类鉴定的宏基因组学中尤其具有挑战性。系统发育定位方法确定这些序列如何适应进化背景。以前实现的系统发育定位算法，如 RAxML 中包含的进化定位算法（EPA）或 PPLACER，越来越多地用于此目的。然而，由于 NGS 技术的稳步进步，当前的实现面临着相当大的可扩展性限制。在此，我们提出了 EPA-NG，这是 EPA 的完整重新实现，速度更快，提供分布式内存并行化，并集成了 RAxML-EPA 和 PPLACER 的概念。EPA-NG 可以在标准共享内存上执行，也可以在分布式内存系统（例如计算集群）上执行。为了展示 EPA-NG 的可扩展性，我们在不到 7 小时的时间内，使用 2048 个内核，将来自 Tara Oceans 项目的 10 亿个宏基因组读取放置在一个包含 3748 个分类单元的参考树中。我们的性能评估表明，在顺序执行模式下，EPA-NG 的性能比 RAxML-EPA 和 PPLACER 高出高达 30 倍，而在共享内存系统上实现了相当的并行效率。我们进一步表明，EPA-NG 的分布式内存并行化可扩展到 2048 个内核。EPA-NG 可在 AGPLv3 许可证下获得：https://github.com/Pbdas/epa-ng。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d047/6368480/e0d60e693364/syy054f1.jpg

相似文献

EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences.EPA-ng：大规模并行遗传序列布局进化。

Syst Biol. 2019 Mar 1;68(2):365-369. doi: 10.1093/sysbio/syy054.

SCAMPP: Scaling Alignment-Based Phylogenetic Placement to Large Trees.SCAMPP：将基于比对的系统发育定位扩展到大型树

IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1417-1430. doi: 10.1109/TCBB.2022.3170386. Epub 2023 Apr 3.

Exploring parallel MPI fault tolerance mechanisms for phylogenetic inference with RAxML-NG.探索使用 RAxML-NG 进行系统发育推断的并行 MPI 容错机制。

Bioinformatics. 2021 Nov 18;37(22):4056-4063. doi: 10.1093/bioinformatics/btab399.

pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.pplacer：将序列线性时间最大似然和贝叶斯系统发生放置到固定参照树上。

BMC Bioinformatics. 2010 Oct 30;11:538. doi: 10.1186/1471-2105-11-538.

SEPP: SATé-enabled phylogenetic placement.SEPP：基于SATé的系统发育定位

Pac Symp Biocomput. 2012:247-58. doi: 10.1142/9789814366496_0024.

RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference.RAxML-NG：用于最大似然系统发育推断的快速、可扩展和用户友好的工具。

Bioinformatics. 2019 Nov 1;35(21):4453-4455. doi: 10.1093/bioinformatics/btz305.

A fast and memory-efficient implementation of the transfer bootstrap.转移.bootstrap 的快速且节省内存的实现。

Bioinformatics. 2020 Apr 1;36(7):2280-2281. doi: 10.1093/bioinformatics/btz874.

Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood.基于最大似然法的短序列读取进化定位的性能、准确性和网络服务器。

Syst Biol. 2011 May;60(3):291-302. doi: 10.1093/sysbio/syr010. Epub 2011 Mar 23.

SCAMPP+FastTree: improving scalability for likelihood-based phylogenetic placement.SCAMPP+FastTree：提高基于似然法的系统发育定位的可扩展性。

Bioinform Adv. 2023 Jan 30;3(1):vbad008. doi: 10.1093/bioadv/vbad008. eCollection 2023.

ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.ViraPipe：用于从下一代测序读取中进行病毒宏基因组分析的可扩展并行管道。

Bioinformatics. 2018 Mar 15;34(6):928-935. doi: 10.1093/bioinformatics/btx702.

引用本文的文献

Subspecific variation in gut microbiota of North American bison in a sympatric setting reveals differentially abundant taxa.同域环境下北美野牛肠道微生物群的亚种变异揭示了丰度有差异的分类群。

Anim Microbiome. 2025 Aug 21;7(1):89. doi: 10.1186/s42523-025-00451-7.

Dysbiosis and depression: A study of gut microbiota alterations and functional pathways in antidepressant-naïve mood disorder patients.肠道菌群失调与抑郁症：对抗抑郁药初治的心境障碍患者肠道微生物群改变及功能通路的研究

Transl Psychiatry. 2025 Aug 18;15(1):290. doi: 10.1038/s41398-025-03521-1.

metabolic interaction network of a rationally designed nasal microbiota community.合理设计的鼻腔微生物群落的代谢相互作用网络。

iScience. 2025 Jul 14;28(8):113114. doi: 10.1016/j.isci.2025.113114. eCollection 2025 Aug 15.

Social Microbial Transmission in a Solitary Mammal.独居哺乳动物中的社会性微生物传播

Ecol Lett. 2025 Aug;28(8):e70186. doi: 10.1111/ele.70186.

characterization of amino acid digestibility and fermentative properties of a specific hydrolyzed yeast, and its effects on growth performance and fecal microbiota in weanling piglets.特定水解酵母的氨基酸消化率和发酵特性及其对断奶仔猪生长性能和粪便微生物群的影响

Front Nutr. 2025 Jul 14;12:1596561. doi: 10.3389/fnut.2025.1596561. eCollection 2025.

Predicting gene distribution in ammonia-oxidizing archaea using phylogenetic signals.利用系统发育信号预测氨氧化古菌中的基因分布

ISME Commun. 2025 May 23;5(1):ycaf087. doi: 10.1093/ismeco/ycaf087. eCollection 2025 Jan.

Exercise-induced microbiota metabolite enhances CD8 T cell antitumor immunity promoting immunotherapy efficacy.运动诱导的微生物群代谢产物增强CD8 T细胞抗肿瘤免疫力，提高免疫治疗效果。

Cell. 2025 Jul 4. doi: 10.1016/j.cell.2025.06.018.

Spatiotemporal characterization of the dynamic changes in the intestinal microbiota of Taihe Silky Fowl.泰和丝羽乌骨鸡肠道微生物群动态变化的时空特征

Anim Microbiome. 2025 Jul 4;7(1):72. doi: 10.1186/s42523-025-00426-8.

Harnessing conductive materials to reshape sewer microbiomes and mitigate corrosion from sulfide and hydrogen sulfide formation.利用导电材料重塑下水道微生物群落，并减轻因硫化物和硫化氢形成而产生的腐蚀。

Sci Rep. 2025 Jul 1;15(1):21382. doi: 10.1038/s41598-025-06099-2.

Metagenomic insights to bacterial communities, functional traits, and soil health in banana smallholder agroecosystems of Kenya.肯尼亚香蕉小农户农业生态系统中细菌群落、功能特性及土壤健康的宏基因组学见解

Front Microbiol. 2025 May 30;16:1582271. doi: 10.3389/fmicb.2025.1582271. eCollection 2025.

本文引用的文献

Parasites dominate hyperdiverse soil protist communities in Neotropical rainforests.寄生虫在新热带雨林中高度多样化的土壤原生生物群落中占主导地位。

Nat Ecol Evol. 2017 Mar 20;1(4):91. doi: 10.1038/s41559-017-0091.

Phylogeny-aware identification and correction of taxonomically mislabeled sequences.基于系统发育的分类错误标记序列的识别与校正

Nucleic Acids Res. 2016 Jun 20;44(11):5022-33. doi: 10.1093/nar/gkw396. Epub 2016 May 10.

Ocean plankton. Structure and function of the global ocean microbiome.海洋浮游生物。全球海洋微生物组的结构和功能。

Science. 2015 May 22;348(6237):1261359. doi: 10.1126/science.1261359.

Metagenomic species profiling using universal phylogenetic marker genes.基于通用系统发育标记基因的宏基因组物种分析。

Nat Methods. 2013 Dec;10(12):1196-9. doi: 10.1038/nmeth.2693. Epub 2013 Oct 20.

Topographic diversity of fungal and bacterial communities in human skin.人体皮肤真菌和细菌群落的地形多样性。

Nature. 2013 Jun 20;498(7454):367-70. doi: 10.1038/nature12171. Epub 2013 May 22.

The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples.用于环境序列样本的系统发育 Kantorovich-Rubinstein 度量

J R Stat Soc Series B Stat Methodol. 2012 Jun 1;74(3):569-592. doi: 10.1111/j.1467-9868.2011.01018.x. Epub 2012 Feb 15.

Bacterial communities in women with bacterial vaginosis: high resolution phylogenetic analyses reveal relationships of microbiota to clinical criteria.细菌性阴道病患者阴道内细菌群落：高分辨率系统发育分析揭示微生物群与临床标准的关系。

PLoS One. 2012;7(6):e37818. doi: 10.1371/journal.pone.0037818. Epub 2012 Jun 18.

Aligning short reads to reference alignments and trees.将短读段比对到参考比对和树。

Bioinformatics. 2011 Aug 1;27(15):2068-75. doi: 10.1093/bioinformatics/btr320. Epub 2011 Jun 2.

Syst Biol. 2011 May;60(3):291-302. doi: 10.1093/sysbio/syr010. Epub 2011 Mar 23.

BMC Bioinformatics. 2010 Oct 30;11:538. doi: 10.1186/1471-2105-11-538.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

EPA-ng：大规模并行遗传序列布局进化。

EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献