University of Vienna, Vienna, Austria.
Mol Biol Evol. 2012 Sep;29(9):2133-45. doi: 10.1093/molbev/mss078. Epub 2012 Mar 1.
Simulating the change of protein sequences over time in a biologically realistic way is fundamental for a broad range of studies with a focus on evolution. It is, thus, problematic that typically simulators evolve individual sites of a sequence identically and independently. More realistic simulations are possible; however, they are often prohibited by limited knowledge concerning site-specific evolutionary constraints or functional dependencies between amino acids. As a consequence, a protein's functional and structural characteristics are rapidly lost in the course of simulated evolution. Here, we present REvolver (www.cibiv.at/software/revolver), a program that simulates protein sequence alteration such that evolutionarily stable sequence characteristics, like functional domains, are maintained. For this purpose, REvolver recruits profile hidden Markov models (pHMMs) for parameterizing site-specific models of sequence evolution in an automated fashion. pHMMs derived from alignments of homologous proteins or protein domains capture information regarding which sequence sites remained conserved over time and where in a sequence insertions or deletions are more likely to occur. Thus, they describe constraints on the evolutionary process acting on these sequences. To demonstrate the performance of REvolver as well as its applicability in large-scale simulation studies, we evolved the entire human proteome up to 1.5 expected substitutions per site. Simultaneously, we analyzed the preservation of Pfam and SMART domains in the simulated sequences over time. REvolver preserved 92% of the Pfam domains originally present in the human sequences. This value drops to 15% when traditional models of amino acid sequence evolution are used. Thus, REvolver represents a significant advance toward a realistic simulation of protein sequence evolution on a proteome-wide scale. Further, REvolver facilitates the simulation of a protein family with a user-defined domain architecture at the root.
以一种生物上合理的方式模拟蛋白质序列随时间的变化对于以进化为重点的广泛研究是至关重要的。然而,通常的模拟器以相同的方式独立地进化序列的单个位点,这是有问题的。更现实的模拟是可能的;然而,由于对特定于位点的进化约束或氨基酸之间的功能依赖性的有限了解,它们通常受到限制。因此,在模拟进化过程中,蛋白质的功能和结构特征会迅速丢失。在这里,我们介绍了 REvolver(www.cibiv.at/software/revolver),这是一个模拟蛋白质序列改变的程序,可以保持进化稳定的序列特征,如功能域。为此,REvolver 以自动的方式使用轮廓隐马尔可夫模型(pHMMs)来参数化特定于位点的序列进化模型。从同源蛋白质或蛋白质结构域的比对中得出的 pHMMs 捕获了关于哪些序列位点随着时间的推移保持保守以及在序列中插入或缺失更可能发生的信息。因此,它们描述了作用于这些序列的进化过程的约束。为了展示 REvolver 的性能及其在大规模模拟研究中的适用性,我们将整个人类蛋白质组进化到每个位点 1.5 个预期的替换。同时,我们分析了模拟序列中 Pfam 和 SMART 结构域随时间的保存情况。REvolver 保留了人类序列中最初存在的 92%的 Pfam 结构域。当使用传统的氨基酸序列进化模型时,这个值下降到 15%。因此,REvolver 代表了在全蛋白质组范围内对蛋白质序列进化进行真实模拟的重大进展。此外,REvolver 还可以模拟以用户定义的结构域体系结构为根的蛋白质家族。