• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

phastSim:用于大流行规模数据集的序列进化的高效模拟。

phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets.

机构信息

European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom.

Department of Genetics, University of Cambridge, Cambridge, United Kingdom.

出版信息

PLoS Comput Biol. 2022 Apr 29;18(4):e1010056. doi: 10.1371/journal.pcbi.1010056. eCollection 2022 Apr.

DOI:10.1371/journal.pcbi.1010056
PMID:35486906
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9094560/
Abstract

Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, and are an essential component of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here, we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and it implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutability models that we developed to more realistically represent SARS-CoV-2 genome evolution.

摘要

序列模拟器是生物信息学中的基本工具,因为它们允许我们测试数据处理和推理工具,并且是一些推理方法的重要组成部分。然而,可用序列数据的持续激增正在考验我们生物信息学软件的极限。一个例子是大量的 SARS-CoV-2 基因组可用,这超出了许多方法的处理能力,并且模拟如此大规模的数据集也证明是困难的。在这里,我们提出了一种新的算法和软件,用于在树的分支较短时有效地模拟序列进化,例如 > 100,000 个叶(tips),这在基因组流行病学中是典型的。我们的算法基于 Gillespie 方法,并实现了一种有效的多层搜索树结构,通过利用只有一小部分基因组可能在考虑的系统发育的每个分支上发生突变的事实,提供了高效的计算效率。我们的开源软件允许与其他 Python 包以及各种进化模型(包括我们开发的插入缺失模型和新的高突变模型)轻松集成,以更真实地表示 SARS-CoV-2 基因组进化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/b5f7cac26d9a/pcbi.1010056.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/6b4dffc7f109/pcbi.1010056.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/ee15c30f85e7/pcbi.1010056.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/59f0e650a5b9/pcbi.1010056.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/d31b09b42a8d/pcbi.1010056.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/603257f43c51/pcbi.1010056.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/64064255de88/pcbi.1010056.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/b5f7cac26d9a/pcbi.1010056.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/6b4dffc7f109/pcbi.1010056.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/ee15c30f85e7/pcbi.1010056.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/59f0e650a5b9/pcbi.1010056.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/d31b09b42a8d/pcbi.1010056.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/603257f43c51/pcbi.1010056.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/64064255de88/pcbi.1010056.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6384/9094560/b5f7cac26d9a/pcbi.1010056.g007.jpg

相似文献

1
phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets.phastSim:用于大流行规模数据集的序列进化的高效模拟。
PLoS Comput Biol. 2022 Apr 29;18(4):e1010056. doi: 10.1371/journal.pcbi.1010056. eCollection 2022 Apr.
2
phastSim: efficient simulation of sequence evolution for pandemic-scale datasets.phastSim:针对大流行规模数据集的序列进化高效模拟
bioRxiv. 2021 Sep 23:2021.03.15.435416. doi: 10.1101/2021.03.15.435416.
3
Taxonium, a web-based tool for exploring large phylogenetic trees.Taxonium,一个用于探索大型系统发育树的网络工具。
Elife. 2022 Nov 15;11:e82392. doi: 10.7554/eLife.82392.
4
VGsim: Scalable viral genealogy simulator for global pandemic.VGsim:用于全球大流行的可扩展病毒系统发育模拟器。
PLoS Comput Biol. 2022 Aug 24;18(8):e1010409. doi: 10.1371/journal.pcbi.1010409. eCollection 2022 Aug.
5
Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic.超快现有树木样本放置 (UShER) 可实现 SARS-CoV-2 大流行的实时系统发生学。
Nat Genet. 2021 Jun;53(6):809-816. doi: 10.1038/s41588-021-00862-7. Epub 2021 May 10.
6
COVID-19 CG enables SARS-CoV-2 mutation and lineage tracking by locations and dates of interest.COVID-19 CG 通过关注的地点和日期来实现 SARS-CoV-2 的突变和谱系追踪。
Elife. 2021 Feb 23;10:e63409. doi: 10.7554/eLife.63409.
7
A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees.每日更新的 SARS-CoV-2 突变注释树综合数据库和工具。
Mol Biol Evol. 2021 Dec 9;38(12):5819-5824. doi: 10.1093/molbev/msab264.
8
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
9
CMAPLE: Efficient Phylogenetic Inference in the Pandemic Era.CMAPLE:大流行时代的高效系统发育推断。
Mol Biol Evol. 2024 Jul 3;41(7). doi: 10.1093/molbev/msae134.
10
Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity.在时间、地理和病毒多样性方面重复采样 SARS-CoV-2 基因组。
F1000Res. 2020 Jun 29;9:657. doi: 10.12688/f1000research.24751.2. eCollection 2020.

引用本文的文献

1
Reference-Free Variant Calling with Local Graph Construction with ska lo (SKA).使用ska lo(SKA)进行局部图构建的无参考变异检测
Mol Biol Evol. 2025 Apr 1;42(4). doi: 10.1093/molbev/msaf077.
2
Seamless, rapid, and accurate analyses of outbreak genomic data using split -mer analysis.利用分拆分析实现爆发基因组数据的无缝、快速和准确分析。
Genome Res. 2024 Oct 29;34(10):1661-1673. doi: 10.1101/gr.279449.124.
3
Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications.插入和缺失:计算方法、进化动态和生物应用。

本文引用的文献

1
VGsim: Scalable viral genealogy simulator for global pandemic.VGsim:用于全球大流行的可扩展病毒系统发育模拟器。
PLoS Comput Biol. 2022 Aug 24;18(8):e1010409. doi: 10.1371/journal.pcbi.1010409. eCollection 2022 Aug.
2
A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees.每日更新的 SARS-CoV-2 突变注释树综合数据库和工具。
Mol Biol Evol. 2021 Dec 9;38(12):5819-5824. doi: 10.1093/molbev/msab264.
3
Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic.
Mol Biol Evol. 2024 Sep 4;41(9). doi: 10.1093/molbev/msae177.
4
Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny.解决新冠病毒系统发育中广泛存在的系统性错误。
bioRxiv. 2024 Nov 5:2024.04.29.591666. doi: 10.1101/2024.04.29.591666.
5
AliSim-HPC: parallel sequence simulator for phylogenetics.AliSim-HPC:用于系统发生学的并行序列模拟器。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad540.
6
Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations.在线系统发育学与 matOptimize 产生等效的树,并且比从头开始和最大似然实现对大型 SARS-CoV-2 系统发育更有效率。
Syst Biol. 2023 Nov 1;72(5):1039-1051. doi: 10.1093/sysbio/syad031.
7
Maximum likelihood pandemic-scale phylogenetics.最大似然法大流行规模系统发育学。
Nat Genet. 2023 May;55(5):746-752. doi: 10.1038/s41588-023-01368-0. Epub 2023 Apr 10.
8
Correlated substitutions reveal SARS-like coronaviruses recombine frequently with a diverse set of structured gene pools.相关替换表明,类 SARS 冠状病毒经常与多样化的结构基因库重组。
Proc Natl Acad Sci U S A. 2023 Jan 31;120(5):e2206945119. doi: 10.1073/pnas.2206945119. Epub 2023 Jan 24.
9
VGsim: Scalable viral genealogy simulator for global pandemic.VGsim:用于全球大流行的可扩展病毒系统发育模拟器。
PLoS Comput Biol. 2022 Aug 24;18(8):e1010409. doi: 10.1371/journal.pcbi.1010409. eCollection 2022 Aug.
10
Identifying SARS-CoV-2 regional introductions and transmission clusters in real time.实时识别严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的区域引入情况和传播集群。
Virus Evol. 2022 Jun 16;8(1):veac048. doi: 10.1093/ve/veac048. eCollection 2022.
超快现有树木样本放置 (UShER) 可实现 SARS-CoV-2 大流行的实时系统发生学。
Nat Genet. 2021 Jun;53(6):809-816. doi: 10.1038/s41588-021-00862-7. Epub 2021 May 10.
4
Mutation Rates and Selection on Synonymous Mutations in SARS-CoV-2.SARS-CoV-2 中同义突变的突变率和选择。
Genome Biol Evol. 2021 May 7;13(5). doi: 10.1093/gbe/evab087.
5
Want to track pandemic variants faster? Fix the bioinformatics bottleneck.想更快追踪新冠病毒变种?解决生物信息学瓶颈问题。
Nature. 2021 Mar;591(7848):30-33. doi: 10.1038/d41586-021-00525-x.
6
Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult.对 SARS-CoV-2 数据进行系统发育分析很困难。
Mol Biol Evol. 2021 May 4;38(5):1777-1791. doi: 10.1093/molbev/msaa314.
7
Stability of SARS-CoV-2 phylogenies.SARS-CoV-2 系统发育的稳定性。
PLoS Genet. 2020 Nov 18;16(11):e1009175. doi: 10.1371/journal.pgen.1009175. eCollection 2020 Nov.
8
The emergence of SARS-CoV-2 in Europe and North America.SARS-CoV-2 在欧洲和北美的出现。
Science. 2020 Oct 30;370(6516):564-570. doi: 10.1126/science.abc8169. Epub 2020 Sep 10.
9
Distinguishing Felsenstein Zone from Farris Zone Using Neural Networks.使用神经网络区分费森斯坦区和法里斯区。
Mol Biol Evol. 2020 Dec 16;37(12):3632-3641. doi: 10.1093/molbev/msaa164.
10
Deep Residual Neural Networks Resolve Quartet Molecular Phylogenies.深度残差神经网络解决四重分子系统发育问题。
Mol Biol Evol. 2020 May 1;37(5):1495-1507. doi: 10.1093/molbev/msz307.