Levy Karin Eli, Rabin Avigayel, Ashkenazy Haim, Shkedy Dafna, Avram Oren, Cartwright Reed A, Pupko Tal
Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel.
Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel The Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel.
Genome Biol Evol. 2015 Nov 3;7(12):3226-38. doi: 10.1093/gbe/evv212.
In this study, we present a novel methodology to infer indel parameters from multiple sequence alignments (MSAs) based on simulations. Our algorithm searches for the set of evolutionary parameters describing indel dynamics which best fits a given input MSA. In each step of the search, we use parametric bootstraps and the Mahalanobis distance to estimate how well a proposed set of parameters fits input data. Using simulations, we demonstrate that our methodology can accurately infer the indel parameters for a large variety of plausible settings. Moreover, using our methodology, we show that indel parameters substantially vary between three genomic data sets: Mammals, bacteria, and retroviruses. Finally, we demonstrate how our methodology can be used to simulate MSAs based on indel parameters inferred from real data sets.
在本研究中,我们提出了一种基于模拟从多序列比对(MSA)中推断插入缺失参数的新方法。我们的算法搜索描述插入缺失动态的进化参数集,该参数集最适合给定的输入MSA。在搜索的每一步中,我们使用参数自抽样和马氏距离来估计一组提议的参数对输入数据的拟合程度。通过模拟,我们证明了我们的方法可以准确推断出各种合理设置下的插入缺失参数。此外,使用我们的方法,我们表明插入缺失参数在三个基因组数据集(哺乳动物、细菌和逆转录病毒)之间有很大差异。最后,我们展示了如何使用我们的方法基于从真实数据集中推断出的插入缺失参数来模拟MSA。