Hermans Pauline, Tsishyn Matsvei, Schwersensky Martin, Rooman Marianne, Pucci Fabrizio
Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium.
Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium.
Mol Biol Evol. 2025 Jan 6;42(1). doi: 10.1093/molbev/msae267.
Determining the impact of mutations on the thermodynamic stability of proteins is essential for a wide range of applications such as rational protein design and genetic variant interpretation. Since protein stability is a major driver of evolution, evolutionary data are often used to guide stability predictions. Many state-of-the-art stability predictors extract evolutionary information from multiple sequence alignments of proteins homologous to a query protein, and leverage it to predict the effects of mutations on protein stability. To evaluate the power and the limitations of such methods, we used the massive amount of stability data recently obtained by deep mutational scanning to study how best to construct multiple sequence alignments and optimally extract evolutionary information from them. We tested different evolutionary models and found that, unexpectedly, independent-site models achieve similar accuracy to more complex epistatic models. A detailed analysis of the latter models suggests that their inference often results in noisy couplings, which do not appear to add predictive power over the independent-site contribution, at least in the context of stability prediction. Interestingly, by combining any of the evolutionary features with a simple structural feature, the relative solvent accessibility of the mutated residue, we achieved similar prediction accuracy to supervised, machine learning-based, protein stability change predictors. Our results provide new insights into the relationship between protein evolution and stability, and show how evolutionary information can be exploited to improve the performance of mutational stability prediction.
确定突变对蛋白质热力学稳定性的影响对于广泛的应用至关重要,例如合理的蛋白质设计和遗传变异解读。由于蛋白质稳定性是进化的主要驱动力,进化数据常被用于指导稳定性预测。许多先进的稳定性预测器从与查询蛋白质同源的蛋白质的多序列比对中提取进化信息,并利用这些信息预测突变对蛋白质稳定性的影响。为了评估此类方法的能力和局限性,我们利用最近通过深度突变扫描获得的大量稳定性数据,研究如何最好地构建多序列比对并从其中最优地提取进化信息。我们测试了不同的进化模型,发现出乎意料的是,独立位点模型与更复杂的上位性模型具有相似的准确性。对后一种模型的详细分析表明,它们的推断往往会产生有噪声的耦合,至少在稳定性预测的背景下,这些耦合似乎并没有比独立位点贡献增加更多的预测能力。有趣的是,通过将任何一种进化特征与一个简单的结构特征(即突变残基的相对溶剂可及性)相结合,我们实现了与基于监督机器学习的蛋白质稳定性变化预测器相似的预测准确性。我们的结果为蛋白质进化与稳定性之间的关系提供了新的见解,并展示了如何利用进化信息来提高突变稳定性预测的性能。