Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
Department of Molecular Biology and Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
Nucleic Acids Res. 2017 Jul 3;45(W1):W453-W457. doi: 10.1093/nar/gkx322.
Many analyses for the detection of biological phenomena rely on a multiple sequence alignment as input. The results of such analyses are often further studied through parametric bootstrap procedures, using sequence simulators. One of the problems with conducting such simulation studies is that users currently have no means to decide which insertion and deletion (indel) parameters to choose, so that the resulting sequences mimic biological data. Here, we present SpartaABC, a web server that aims to solve this issue. SpartaABC implements an approximate-Bayesian-computation rejection algorithm to infer indel parameters from sequence data. It does so by extracting summary statistics from the input. It then performs numerous sequence simulations under randomly sampled indel parameters. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC retains only parameters behind simulations close to the real data. As output, SpartaABC provides point estimates and approximate posterior distributions of the indel parameters. In addition, SpartaABC allows simulating sequences with the inferred indel parameters. To this end, the sequence simulators, Dawg 2.0 and INDELible were integrated. Using SpartaABC we demonstrate the differences in indel dynamics among three protein-coding genes across mammalian orthologs. SpartaABC is freely available for use at http://spartaabc.tau.ac.il/webserver.
许多用于检测生物现象的分析都依赖于多序列比对作为输入。这些分析的结果通常通过使用序列模拟器的参数自举程序进一步研究。进行这种模拟研究的一个问题是,用户目前没有办法决定选择哪些插入和删除(indel)参数,以使生成的序列模拟生物数据。在这里,我们介绍了 SpartaABC,这是一个旨在解决这个问题的网络服务器。SpartaABC 实现了一种近似贝叶斯计算拒绝算法,从序列数据中推断 indel 参数。它通过从输入中提取摘要统计信息来实现这一点。然后,它根据随机采样的 indel 参数进行多次序列模拟。通过计算从输入中提取的摘要统计信息与每个模拟之间的距离,SpartaABC 仅保留与真实数据接近的模拟背后的参数。作为输出,SpartaABC 提供了 indel 参数的点估计值和近似后验分布。此外,SpartaABC 允许使用推断出的 indel 参数模拟序列。为此,集成了序列模拟器 Dawg 2.0 和 INDELible。使用 SpartaABC,我们展示了哺乳动物同源物中三个编码蛋白基因之间 indel 动力学的差异。SpartaABC 可在 http://spartaabc.tau.ac.il/webserver 上免费使用。