Levy Karin Eli, Shkedy Dafna, Ashkenazy Haim, Cartwright Reed A, Pupko Tal
Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel.
Department of Molecular Biology & Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel.
Genome Biol Evol. 2017 May 1;9(5):1280-1294. doi: 10.1093/gbe/evx084.
The most common evolutionary events at the molecular level are single-base substitutions, as well as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Bayesian approaches. In contrast, relatively little has been done to model indel dynamics, probably due to the difficulty in writing explicit likelihood functions. Here, we contribute to the effort of modeling indel dynamics by presenting SpartaABC, an approximate Bayesian computation (ABC) approach to infer indel parameters from sequence data (either aligned or unaligned). SpartaABC circumvents the need to use an explicit likelihood function by extracting summary statistics from simulated sequences. First, summary statistics are extracted from the input sequence data. Second, SpartaABC samples indel parameters from a prior distribution and uses them to simulate sequences. Third, it computes summary statistics from the simulated sets of sequences. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC can provide an approximation to the posterior distribution of indel parameters as well as point estimates. We study the performance of our methodology and show that it provides accurate estimates of indel parameters in simulations. We next demonstrate the utility of SpartaABC by studying the impact of alignment errors on the inference of positive selection. A C ++ program implementing SpartaABC is freely available in http://spartaabc.tau.ac.il.
分子水平上最常见的进化事件是单碱基替换,以及短DNA片段的插入和缺失(indel)。大量研究致力于开发概率替换模型,并使用似然法和贝叶斯方法推断其参数。相比之下,对indel动态建模的研究相对较少,这可能是由于编写显式似然函数存在困难。在这里,我们通过提出SpartaABC为indel动态建模做出贡献,SpartaABC是一种近似贝叶斯计算(ABC)方法,用于从序列数据(比对或未比对)中推断indel参数。SpartaABC通过从模拟序列中提取摘要统计量,避免了使用显式似然函数的需要。首先,从输入序列数据中提取摘要统计量。其次,SpartaABC从先验分布中采样indel参数,并使用它们来模拟序列。第三,它从模拟的序列集中计算摘要统计量。通过计算从输入中提取的摘要统计量与每次模拟之间的距离,SpartaABC可以提供indel参数后验分布的近似值以及点估计。我们研究了我们方法的性能,并表明它在模拟中提供了indel参数的准确估计。接下来,我们通过研究比对错误对正选择推断的影响来证明SpartaABC的实用性。一个实现SpartaABC的C++程序可在http://spartaabc.tau.ac.il免费获取。