• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AliSim-HPC:用于系统发生学的并行序列模拟器。

AliSim-HPC: parallel sequence simulator for phylogenetics.

机构信息

School of Computing, College of Engineering, Computing and Cybernetics, Australian National University, Canberra, ACT 2600, Australia.

出版信息

Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad540.

DOI:10.1093/bioinformatics/btad540
PMID:37656933
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10534053/
Abstract

MOTIVATION

Sequence simulation plays a vital role in phylogenetics with many applications, such as evaluating phylogenetic methods, testing hypotheses, and generating training data for machine-learning applications. We recently introduced a new simulator for multiple sequence alignments called AliSim, which outperformed existing tools. However, with the increasing demands of simulating large data sets, AliSim is still slow due to its sequential implementation; for example, to simulate millions of sequence alignments, AliSim took several days or weeks. Parallelization has been used for many phylogenetic inference methods but not yet for sequence simulation.

RESULTS

This paper introduces AliSim-HPC, which, for the first time, employs high-performance computing for phylogenetic simulations. AliSim-HPC parallelizes the simulation process at both multi-core and multi-CPU levels using the OpenMP and message passing interface (MPI) libraries, respectively. AliSim-HPC is highly efficient and scalable, which reduces the runtime to simulate 100 large gap-free alignments (30 000 sequences of one million sites) from over one day to 11 min using 256 CPU cores from a cluster with six computing nodes, a 153-fold speedup. While the OpenMP version can only simulate gap-free alignments, the MPI version supports insertion-deletion models like the sequential AliSim.

AVAILABILITY AND IMPLEMENTATION

AliSim-HPC is open-source and available as part of the new IQ-TREE version v2.2.3 at https://github.com/iqtree/iqtree2/releases with a user manual at http://www.iqtree.org/doc/AliSim.

摘要

动机

序列模拟在系统发育学中起着至关重要的作用,有许多应用,如评估系统发育方法、检验假设以及为机器学习应用生成训练数据。我们最近引入了一种新的多序列比对模拟程序,称为 AliSim,它的性能优于现有的工具。然而,随着模拟大数据集的需求不断增加,由于其顺序实现,AliSim 仍然很慢;例如,要模拟数百万个序列比对,AliSim 需要几天或几周的时间。并行化已被用于许多系统发育推断方法,但尚未用于序列模拟。

结果

本文介绍了 AliSim-HPC,它首次在系统发育模拟中使用高性能计算。AliSim-HPC 使用 OpenMP 和消息传递接口 (MPI) 库分别在多核和多 CPU 级别上并行化模拟过程。AliSim-HPC 具有高效性和可扩展性,将模拟 100 个大无间隙比对(30000 个百万位序列)的运行时间从超过一天缩短到使用 6 个计算节点的集群中的 256 个 CPU 核心的 11 分钟,速度提高了 153 倍。虽然 OpenMP 版本只能模拟无间隙比对,但 MPI 版本支持插入-缺失模型,如顺序 AliSim。

可用性和实现

AliSim-HPC 是开源的,作为新的 IQ-TREE 版本 v2.2.3 的一部分提供,可在 https://github.com/iqtree/iqtree2/releases 上获得,用户手册可在 http://www.iqtree.org/doc/AliSim 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/a250b508e3b1/btad540f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/05f236a00b0c/btad540f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/c4e1aa4f9309/btad540f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/2b3533c59faa/btad540f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/ffa548bfc20b/btad540f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/18c3e9e32f1a/btad540f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/4b112318600b/btad540f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/a250b508e3b1/btad540f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/05f236a00b0c/btad540f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/c4e1aa4f9309/btad540f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/2b3533c59faa/btad540f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/ffa548bfc20b/btad540f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/18c3e9e32f1a/btad540f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/4b112318600b/btad540f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b692/10534053/a250b508e3b1/btad540f7.jpg

相似文献

1
AliSim-HPC: parallel sequence simulator for phylogenetics.AliSim-HPC:用于系统发生学的并行序列模拟器。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad540.
2
AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era.AliSim:基因组时代快速且通用的进化序列模拟器。
Mol Biol Evol. 2022 May 3;39(5). doi: 10.1093/molbev/msac092.
3
WFA-GPU: gap-affine pairwise read-alignment using GPUs.WFA-GPU:基于 GPU 的缺口仿射两两序列比对
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad701.
4
New TNT routines for parallel computing with MPI.用于使用MPI进行并行计算的新TNT例程。
Mol Phylogenet Evol. 2023 Jan;178:107643. doi: 10.1016/j.ympev.2022.107643. Epub 2022 Oct 8.
5
TOPALi v2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops.TOPALi v2:一个用于在高性能计算集群和多核桌面上对多序列比对进行进化分析的丰富图形界面。
Bioinformatics. 2009 Jan 1;25(1):126-7. doi: 10.1093/bioinformatics/btn575. Epub 2008 Nov 4.
6
Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications.面向高性能计算的生物信息学应用学习算法并行实现
BMC Bioinformatics. 2014;15 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-15-S5-S2. Epub 2014 May 6.
7
[Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].[系列:PHITS代码的医学应用(2):并行计算加速]
Igaku Butsuri. 2015;35(3):264-8.
8
Profiling the BLAST bioinformatics application for load balancing on high-performance computing clusters.剖析 BLAST 生物信息学应用在高性能计算集群中的负载均衡。
BMC Bioinformatics. 2022 Dec 16;23(1):544. doi: 10.1186/s12859-022-05029-7.
9
DecentTree: scalable Neighbour-Joining for the genomic era.DecentTree:基因组时代可扩展的近邻连接算法。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad536.
10
TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing.TREE-PUZZLE:使用四重奏和并行计算的最大似然系统发育分析。
Bioinformatics. 2002 Mar;18(3):502-4. doi: 10.1093/bioinformatics/18.3.502.

引用本文的文献

1
Ultrafast and ultralarge multiple sequence alignments using TWILIGHT.使用TWILIGHT进行超快速和超大的多序列比对。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i332-i341. doi: 10.1093/bioinformatics/btaf212.
2
Multiple merger coalescent inference of effective population size.有效种群大小的多重合并合并推断
Philos Trans R Soc Lond B Biol Sci. 2025 Feb 13;380(1919):20230306. doi: 10.1098/rstb.2023.0306. Epub 2025 Feb 20.
3
nT4X and nT4M: Novel Time Non-reversible Mixture Amino Acid Substitution Models.nT4X和nT4M:新型时间不可逆混合氨基酸取代模型。

本文引用的文献

1
Reliable estimation of tree branch lengths using deep neural networks.利用深度神经网络可靠估计树枝长度。
PLoS Comput Biol. 2024 Aug 5;20(8):e1012337. doi: 10.1371/journal.pcbi.1012337. eCollection 2024 Aug.
2
Phylogenetic inference using generative adversarial networks.基于生成对抗网络的系统发育推断。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad543.
3
Selective sweep sites and SNP dense regions differentiate isolates across scales.选择性清除位点和单核苷酸多态性密集区域在不同尺度上区分分离株。
J Mol Evol. 2025 Feb;93(1):136-148. doi: 10.1007/s00239-024-10230-8. Epub 2025 Jan 20.
4
CAT-Posterior Mean Site Frequencies Improves Phylogenetic Modeling Under Maximum Likelihood and Resolves Tardigrada as the Sister of Arthropoda Plus Onychophora.CAT-后验均值位点频率在最大似然法下改进了系统发育建模,并将缓步动物门解析为节肢动物门和有爪动物门的姊妹类群。
Genome Biol Evol. 2025 Jan 6;17(1). doi: 10.1093/gbe/evae273.
Front Microbiol. 2022 Sep 7;13:787856. doi: 10.3389/fmicb.2022.787856. eCollection 2022.
4
AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era.AliSim:基因组时代快速且通用的进化序列模拟器。
Mol Biol Evol. 2022 May 3;39(5). doi: 10.1093/molbev/msac092.
5
phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets.phastSim:用于大流行规模数据集的序列进化的高效模拟。
PLoS Comput Biol. 2022 Apr 29;18(4):e1010056. doi: 10.1371/journal.pcbi.1010056. eCollection 2022 Apr.
6
Distinguishing Felsenstein Zone from Farris Zone Using Neural Networks.使用神经网络区分费森斯坦区和法里斯区。
Mol Biol Evol. 2020 Dec 16;37(12):3632-3641. doi: 10.1093/molbev/msaa164.
7
ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning.ModelTeller:使用机器学习进行最优系统发育重建的模型选择。
Mol Biol Evol. 2020 Nov 1;37(11):3338-3352. doi: 10.1093/molbev/msaa154.
8
IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.IQ-TREE 2:基因组时代系统发育推断的新模型和有效方法。
Mol Biol Evol. 2020 May 1;37(5):1530-1534. doi: 10.1093/molbev/msaa015.
9
Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning.使用深度学习从多重序列比对中准确推断树拓扑结构。
Syst Biol. 2020 Mar 1;69(2):221-233. doi: 10.1093/sysbio/syz060.
10
RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference.RAxML-NG:用于最大似然系统发育推断的快速、可扩展和用户友好的工具。
Bioinformatics. 2019 Nov 1;35(21):4453-4455. doi: 10.1093/bioinformatics/btz305.