Ferreiro David, González-Vázquez Luis Daniel, Prado-Comesaña Ana, Arenas Miguel
CINBIO, Universidade de Vigo, Vigo, Spain.
Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, Vigo, Spain.
Elife. 2025 Sep 24;14:RP106365. doi: 10.7554/eLife.106365.
Evolutionary studies in population genetics and ecology were mainly focused on predicting and understanding past evolutionary events. Recently, however, a growing trend explores the prediction of evolutionary trajectories toward the future promoted by its wide variety of applications. In this context, we introduce a forecasting protein evolution method that integrates birth-death population models with substitution models that consider selection on protein folding stability. In contrast to traditional population genetics methods that usually make the unrealistic assumption of simulating molecular evolution separately from the evolutionary history, the present method combines both processes to simultaneously model forward-in-time birth-death evolutionary trajectories and protein evolution under structurally constrained substitution models that outperformed traditional empirical substitution models. We implemented the method into a freely available computer framework. We evaluated the accuracy of the predictions with several monitored viral proteins of broad interest. Overall, the method showed acceptable errors in predicting the folding stability of the forecasted protein variants, but, expectedly, the errors were larger in the prediction of the corresponding sequences. We conclude that forecasting protein evolution is feasible in certain evolutionary scenarios and provide suggestions to enhance its accuracy by improving the underlying models of evolution.
群体遗传学和生态学中的进化研究主要集中于预测和理解过去的进化事件。然而,近来,由于其广泛的应用,一种探索预测未来进化轨迹的趋势日益增长。在此背景下,我们引入一种预测蛋白质进化的方法,该方法将生死群体模型与考虑蛋白质折叠稳定性选择的替代模型相结合。与传统群体遗传学方法通常做出的将分子进化与进化历史分开模拟的不切实际假设不同,本方法将这两个过程结合起来,以同时模拟时间向前的生死进化轨迹以及在结构受限替代模型下的蛋白质进化,该模型优于传统的经验替代模型。我们将该方法应用于一个免费的计算机框架中。我们用几种广泛关注的受监测病毒蛋白评估了预测的准确性。总体而言,该方法在预测预测蛋白质变体的折叠稳定性时显示出可接受的误差,但不出所料,在预测相应序列时误差更大。我们得出结论,在某些进化场景中预测蛋白质进化是可行的,并通过改进基础进化模型为提高其准确性提供了建议。