• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AliSim:基因组时代快速且通用的进化序列模拟器。

AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era.

机构信息

School of Computing, College of Engineering and Computer Science, Australian National University, Canberra, ACT 2600, Australia.

Ecology and Evolution, Research School of Biology, College of Science, Australian National University, Canberra, ACT 2600, Australia.

出版信息

Mol Biol Evol. 2022 May 3;39(5). doi: 10.1093/molbev/msac092.

DOI:10.1093/molbev/msac092
PMID:35511713
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9113491/
Abstract

Sequence simulators play an important role in phylogenetics. Simulated data has many applications, such as evaluating the performance of different methods, hypothesis testing with parametric bootstraps, and, more recently, generating data for training machine-learning applications. Many sequence simulation programmes exist, but the most feature-rich programmes tend to be rather slow, and the fastest programmes tend to be feature-poor. Here, we introduce AliSim, a new tool that can efficiently simulate biologically realistic alignments under a large range of complex evolutionary models. To achieve high performance across a wide range of simulation conditions, AliSim implements an adaptive approach that combines the commonly used rate matrix and probability matrix approaches. AliSim takes 1.4 h and 1.3 GB RAM to simulate alignments with one million sequences or sites, whereas popular software Seq-Gen, Dawg, and INDELible require 2-5 h and 50-500 GB of RAM. We provide AliSim as an extension of the IQ-TREE software version 2.2, freely available at www.iqtree.org, and a comprehensive user tutorial at http://www.iqtree.org/doc/AliSim.

摘要

序列模拟器在系统发育学中起着重要作用。模拟数据有许多应用,例如评估不同方法的性能、使用参数引导进行假设检验,以及最近为机器学习应用程序生成数据。有许多序列模拟程序,但功能最丰富的程序往往运行速度较慢,而最快的程序往往功能较少。在这里,我们介绍 AliSim,这是一种新工具,可以在广泛的复杂进化模型下有效地模拟具有生物学意义的排列。为了在广泛的模拟条件下实现高性能,AliSim 实现了一种自适应方法,该方法结合了常用的速率矩阵和概率矩阵方法。AliSim 模拟一百万条序列或位点的对齐需要 1.4 小时和 1.3GB RAM,而流行的软件 Seq-Gen、Dawg 和 INDELible 需要 2-5 小时和 50-500GB 的 RAM。我们将 AliSim 作为 IQ-TREE 软件版本 2.2 的扩展提供,可在 www.iqtree.org 免费获得,并在 http://www.iqtree.org/doc/AliSim 上提供全面的用户教程。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/744d/9113491/078f3ed92b35/msac092f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/744d/9113491/2e94cdd3c0a2/msac092f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/744d/9113491/078f3ed92b35/msac092f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/744d/9113491/2e94cdd3c0a2/msac092f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/744d/9113491/078f3ed92b35/msac092f2.jpg

相似文献

1
AliSim: A Fast and Versatile Phylogenetic Sequence Simulator for the Genomic Era.AliSim:基因组时代快速且通用的进化序列模拟器。
Mol Biol Evol. 2022 May 3;39(5). doi: 10.1093/molbev/msac092.
2
AliSim-HPC: parallel sequence simulator for phylogenetics.AliSim-HPC:用于系统发生学的并行序列模拟器。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad540.
3
IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.IQ-TREE 2:基因组时代系统发育推断的新模型和有效方法。
Mol Biol Evol. 2020 May 1;37(5):1530-1534. doi: 10.1093/molbev/msaa015.
4
DNA assembly with gaps (Dawg): simulating sequence evolution.带缺口的DNA组装(Dawg):模拟序列进化
Bioinformatics. 2005 Nov 1;21 Suppl 3:iii31-8. doi: 10.1093/bioinformatics/bti1200.
5
indel-Seq-Gen: a new protein family simulator incorporating domains, motifs, and indels.插入缺失序列生成器(indel-Seq-Gen):一种整合结构域、基序和插入缺失的新型蛋白质家族模拟器。
Mol Biol Evol. 2007 Mar;24(3):640-9. doi: 10.1093/molbev/msl195. Epub 2006 Dec 8.
6
QMaker: Fast and Accurate Method to Estimate Empirical Models of Protein Evolution.QMaker:一种快速准确的蛋白质进化经验模型估计方法。
Syst Biol. 2021 Aug 11;70(5):1046-1060. doi: 10.1093/sysbio/syab010.
7
Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees.Seq-Gen:一款用于沿系统发育树对DNA序列进化进行蒙特卡洛模拟的应用程序。
Comput Appl Biosci. 1997 Jun;13(3):235-8. doi: 10.1093/bioinformatics/13.3.235.
8
INDELible: a flexible simulator of biological sequence evolution.INDELible:一款灵活的生物序列进化模拟器。
Mol Biol Evol. 2009 Aug;26(8):1879-88. doi: 10.1093/molbev/msp098. Epub 2009 May 7.
9
GHOST: Recovering Historical Signal from Heterotachously Evolved Sequence Alignments.GHOST:从异速进化的序列比对中恢复历史信号。
Syst Biol. 2020 Mar 1;69(2):249-264. doi: 10.1093/sysbio/syz051.
10
IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.IQ-TREE:一种用于估计最大似然系统发育树的快速且有效的随机算法。
Mol Biol Evol. 2015 Jan;32(1):268-74. doi: 10.1093/molbev/msu300. Epub 2014 Nov 3.

引用本文的文献

1
UShER-TB: Scalable, Comprehensive, Accessible Phylogenomic Analysis of .UShER-TB:可扩展、全面且可访问的系统发育基因组分析……(原文不完整)
medRxiv. 2025 Jul 23:2025.07.22.25331806. doi: 10.1101/2025.07.22.25331806.
2
The subordinate role of pseudogenization to recombinative deletion following polyploidization in angiosperms.被子植物多倍体化后假基因化相对于重组缺失的次要作用。
Nat Commun. 2025 Jul 9;16(1):6335. doi: 10.1038/s41467-025-61676-3.
3
CONSTRUCT: an algorithmic tool for identifying functional or structurally important regions in protein tertiary structure.

本文引用的文献

1
phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets.phastSim:用于大流行规模数据集的序列进化的高效模拟。
PLoS Comput Biol. 2022 Apr 29;18(4):e1010056. doi: 10.1371/journal.pcbi.1010056. eCollection 2022 Apr.
2
Distinguishing Felsenstein Zone from Farris Zone Using Neural Networks.使用神经网络区分费森斯坦区和法里斯区。
Mol Biol Evol. 2020 Dec 16;37(12):3632-3641. doi: 10.1093/molbev/msaa164.
3
ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning.ModelTeller:使用机器学习进行最优系统发育重建的模型选择。
构建体:一种用于识别蛋白质三级结构中功能或结构重要区域的算法工具。
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf166.
4
The impact of software and criteria on the selection of best-fit nucleotide substitution models for molecular evolutionary genetic analysis.软件和标准对分子进化遗传分析中最佳拟合核苷酸替换模型选择的影响。
PLoS One. 2025 Mar 26;20(3):e0319774. doi: 10.1371/journal.pone.0319774. eCollection 2025.
5
Phyloformer: Fast, Accurate, and Versatile Phylogenetic Reconstruction with Deep Neural Networks.Phyloformer:使用深度神经网络进行快速、准确且通用的系统发育重建。
Mol Biol Evol. 2025 Apr 1;42(4). doi: 10.1093/molbev/msaf051.
6
Learning genotype-phenotype associations from gaps in multi-species sequence alignments.从多物种序列比对的缺口处学习基因型-表型关联。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf022.
7
nT4X and nT4M: Novel Time Non-reversible Mixture Amino Acid Substitution Models.nT4X和nT4M:新型时间不可逆混合氨基酸取代模型。
J Mol Evol. 2025 Feb;93(1):136-148. doi: 10.1007/s00239-024-10230-8. Epub 2025 Jan 20.
8
CAT-Posterior Mean Site Frequencies Improves Phylogenetic Modeling Under Maximum Likelihood and Resolves Tardigrada as the Sister of Arthropoda Plus Onychophora.CAT-后验均值位点频率在最大似然法下改进了系统发育建模,并将缓步动物门解析为节肢动物门和有爪动物门的姊妹类群。
Genome Biol Evol. 2025 Jan 6;17(1). doi: 10.1093/gbe/evae273.
9
MixtureFinder: Estimating DNA Mixture Models for Phylogenetic Analyses.混合体查找器:用于系统发育分析的DNA混合模型估计
Mol Biol Evol. 2025 Jan 6;42(1). doi: 10.1093/molbev/msae264.
10
Predicting Phylogenetic Bootstrap Values via Machine Learning.基于机器学习的系统发育自举值预测。
Mol Biol Evol. 2024 Oct 4;41(10). doi: 10.1093/molbev/msae215.
Mol Biol Evol. 2020 Nov 1;37(11):3338-3352. doi: 10.1093/molbev/msaa154.
4
IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.IQ-TREE 2:基因组时代系统发育推断的新模型和有效方法。
Mol Biol Evol. 2020 May 1;37(5):1530-1534. doi: 10.1093/molbev/msaa015.
5
Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning.使用深度学习从多重序列比对中准确推断树拓扑结构。
Syst Biol. 2020 Mar 1;69(2):221-233. doi: 10.1093/sysbio/syz060.
6
GHOST: Recovering Historical Signal from Heterotachously Evolved Sequence Alignments.GHOST:从异速进化的序列比对中恢复历史信号。
Syst Biol. 2020 Mar 1;69(2):249-264. doi: 10.1093/sysbio/syz051.
7
Modeling Site Heterogeneity with Posterior Mean Site Frequency Profiles Accelerates Accurate Phylogenomic Estimation.利用后验均值位点频率分布模型化位点异质性可加速准确的系统基因组估计。
Syst Biol. 2018 Mar 1;67(2):216-235. doi: 10.1093/sysbio/syx068.
8
ModelFinder: fast model selection for accurate phylogenetic estimates.ModelFinder:用于准确系统发育估计的快速模型选择
Nat Methods. 2017 Jun;14(6):587-589. doi: 10.1038/nmeth.4285. Epub 2017 May 8.
9
IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.IQ-TREE:一种用于估计最大似然系统发育树的快速且有效的随机算法。
Mol Biol Evol. 2015 Jan;32(1):268-74. doi: 10.1093/molbev/msu300. Epub 2014 Nov 3.
10
The influence of rate heterogeneity among sites on the time dependence of molecular rates.位点间率异质性对分子率时间依赖性的影响。
Mol Biol Evol. 2012 Nov;29(11):3345-58. doi: 10.1093/molbev/mss140. Epub 2012 May 21.