• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用近似贝叶斯计算推断插入缺失的发生率和长度分布

Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation.

作者信息

Levy Karin Eli, Shkedy Dafna, Ashkenazy Haim, Cartwright Reed A, Pupko Tal

机构信息

Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel.

Department of Molecular Biology & Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel.

出版信息

Genome Biol Evol. 2017 May 1;9(5):1280-1294. doi: 10.1093/gbe/evx084.

DOI:10.1093/gbe/evx084
PMID:28453624
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5438127/
Abstract

The most common evolutionary events at the molecular level are single-base substitutions, as well as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Bayesian approaches. In contrast, relatively little has been done to model indel dynamics, probably due to the difficulty in writing explicit likelihood functions. Here, we contribute to the effort of modeling indel dynamics by presenting SpartaABC, an approximate Bayesian computation (ABC) approach to infer indel parameters from sequence data (either aligned or unaligned). SpartaABC circumvents the need to use an explicit likelihood function by extracting summary statistics from simulated sequences. First, summary statistics are extracted from the input sequence data. Second, SpartaABC samples indel parameters from a prior distribution and uses them to simulate sequences. Third, it computes summary statistics from the simulated sets of sequences. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC can provide an approximation to the posterior distribution of indel parameters as well as point estimates. We study the performance of our methodology and show that it provides accurate estimates of indel parameters in simulations. We next demonstrate the utility of SpartaABC by studying the impact of alignment errors on the inference of positive selection. A C ++ program implementing SpartaABC is freely available in http://spartaabc.tau.ac.il.

摘要

分子水平上最常见的进化事件是单碱基替换,以及短DNA片段的插入和缺失(indel)。大量研究致力于开发概率替换模型,并使用似然法和贝叶斯方法推断其参数。相比之下,对indel动态建模的研究相对较少,这可能是由于编写显式似然函数存在困难。在这里,我们通过提出SpartaABC为indel动态建模做出贡献,SpartaABC是一种近似贝叶斯计算(ABC)方法,用于从序列数据(比对或未比对)中推断indel参数。SpartaABC通过从模拟序列中提取摘要统计量,避免了使用显式似然函数的需要。首先,从输入序列数据中提取摘要统计量。其次,SpartaABC从先验分布中采样indel参数,并使用它们来模拟序列。第三,它从模拟的序列集中计算摘要统计量。通过计算从输入中提取的摘要统计量与每次模拟之间的距离,SpartaABC可以提供indel参数后验分布的近似值以及点估计。我们研究了我们方法的性能,并表明它在模拟中提供了indel参数的准确估计。接下来,我们通过研究比对错误对正选择推断的影响来证明SpartaABC的实用性。一个实现SpartaABC的C++程序可在http://spartaabc.tau.ac.il免费获取。

相似文献

1
Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation.使用近似贝叶斯计算推断插入缺失的发生率和长度分布
Genome Biol Evol. 2017 May 1;9(5):1280-1294. doi: 10.1093/gbe/evx084.
2
SpartaABC: a web server to simulate sequences with indel parameters inferred using an approximate Bayesian computation algorithm.SpartaABC:一个 Web 服务器,用于模拟使用近似贝叶斯计算算法推断出的插入缺失参数的序列。
Nucleic Acids Res. 2017 Jul 3;45(W1):W453-W457. doi: 10.1093/nar/gkx322.
3
A Probabilistic Model for Indel Evolution: Differentiating Insertions from Deletions.一种插入/缺失进化的概率模型:区分插入和缺失。
Mol Biol Evol. 2021 Dec 9;38(12):5769-5781. doi: 10.1093/molbev/msab266.
4
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
5
A Simulation-Based Approach to Statistical Alignment.基于模拟的统计对齐方法。
Syst Biol. 2019 Mar 1;68(2):252-266. doi: 10.1093/sysbio/syy059.
6
Statistical framework to determine indel-length distribution.用于确定插入缺失长度分布的统计框架。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae043.
7
Measuring Accelerated Rates of Insertions and Deletions Independent of Rates of Nucleotide Substitution.测量与核苷酸替换速率无关的插入和缺失的加速速率。
J Mol Evol. 2016 Oct;83(3-4):137-146. doi: 10.1007/s00239-016-9761-9. Epub 2016 Oct 21.
8
Inferring Indel Parameters using a Simulation-based Approach.使用基于模拟的方法推断插入缺失参数。
Genome Biol Evol. 2015 Nov 3;7(12):3226-38. doi: 10.1093/gbe/evv212.
9
On optimal selection of summary statistics for approximate Bayesian computation.关于近似贝叶斯计算中汇总统计量的最优选择
Stat Appl Genet Mol Biol. 2010;9:Article34. doi: 10.2202/1544-6115.1576. Epub 2010 Sep 6.
10
Dindel: accurate indel calls from short-read data.Dindel:从短读数据中进行精确的插入缺失突变(Indel)调用。
Genome Res. 2011 Jun;21(6):961-73. doi: 10.1101/gr.112326.110. Epub 2010 Oct 27.

引用本文的文献

1
Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications.插入和缺失:计算方法、进化动态和生物应用。
Mol Biol Evol. 2024 Sep 4;41(9). doi: 10.1093/molbev/msae177.
2
Statistical framework to determine indel-length distribution.用于确定插入缺失长度分布的统计框架。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae043.
3
A Probabilistic Model for Indel Evolution: Differentiating Insertions from Deletions.一种插入/缺失进化的概率模型:区分插入和缺失。

本文引用的文献

1
Detection of Pathways Affected by Positive Selection in Primate Lineages Ancestral to Humans.人类祖先灵长类谱系中受正选择影响的通路检测
Mol Biol Evol. 2017 Jun 1;34(6):1391-1402. doi: 10.1093/molbev/msx083.
2
Measuring Accelerated Rates of Insertions and Deletions Independent of Rates of Nucleotide Substitution.测量与核苷酸替换速率无关的插入和缺失的加速速率。
J Mol Evol. 2016 Oct;83(3-4):137-146. doi: 10.1007/s00239-016-9761-9. Epub 2016 Oct 21.
3
Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking.
Mol Biol Evol. 2021 Dec 9;38(12):5769-5781. doi: 10.1093/molbev/msab266.
4
Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction.多序列比对平均法提高系统发育重建。
Syst Biol. 2019 Jan 1;68(1):117-130. doi: 10.1093/sysbio/syy036.
5
The Prevalence and Evolutionary Conservation of Inverted Repeats in Proteobacteria.变形菌中倒位重复序列的流行和进化保守性。
Genome Biol Evol. 2018 Mar 1;10(3):918-927. doi: 10.1093/gbe/evy044.
有比对和无比对情况下的系统发育树估计:新的距离方法与基准测试
Syst Biol. 2017 Mar 1;66(2):218-231. doi: 10.1093/sysbio/syw074.
4
Family-Joining: A Fast Distance-Based Method for Constructing Generally Labeled Trees.家族合并:一种基于距离的快速构建通用标记树的方法。
Mol Biol Evol. 2016 Oct;33(10):2720-34. doi: 10.1093/molbev/msw123. Epub 2016 Jul 19.
5
A simple method to control over-alignment in the MAFFT multiple sequence alignment program.一种在MAFFT多序列比对程序中控制过度比对的简单方法。
Bioinformatics. 2016 Jul 1;32(13):1933-42. doi: 10.1093/bioinformatics/btw108. Epub 2016 Feb 26.
6
Ensembl 2016.Ensembl 2016。
Nucleic Acids Res. 2016 Jan 4;44(D1):D710-6. doi: 10.1093/nar/gkv1157. Epub 2015 Dec 19.
7
Inferring Indel Parameters using a Simulation-based Approach.使用基于模拟的方法推断插入缺失参数。
Genome Biol Evol. 2015 Nov 3;7(12):3226-38. doi: 10.1093/gbe/evv212.
8
Twisted trees and inconsistency of tree estimation when gaps are treated as missing data - The impact of model mis-specification in distance corrections.当将间隙视为缺失数据时树木扭曲及树木估计的不一致性——模型错误设定对距离校正的影响
Mol Phylogenet Evol. 2015 Dec;93:289-95. doi: 10.1016/j.ympev.2015.07.027. Epub 2015 Aug 6.
9
GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters.指南2:考虑多个参数的不确定性,准确检测不可靠的比对区域。
Nucleic Acids Res. 2015 Jul 1;43(W1):W7-14. doi: 10.1093/nar/gkv318. Epub 2015 Apr 16.
10
AABC: approximate approximate Bayesian computation for inference in population-genetic models.AABC:用于群体遗传模型推断的近似近似贝叶斯计算
Theor Popul Biol. 2015 Feb;99:31-42. doi: 10.1016/j.tpb.2014.09.002. Epub 2014 Sep 26.