利用 MODELLER 中的平均方法进行蛋白质结构预测。

Making Use of Averaging Methods in MODELLER for Protein Structure Prediction.

机构信息

Department of Biochemical Sciences, Sapienza University of Rome, 00185 Rome, Italy.

出版信息

Int J Mol Sci. 2024 Jan 31;25(3):1731. doi: 10.3390/ijms25031731.

DOI:10.3390/ijms25031731

PMID:38339009

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10855553/

Abstract

Recent advances in protein structure prediction, driven by AlphaFold 2 and machine learning, demonstrate proficiency in static structures but encounter challenges in capturing essential dynamic features crucial for understanding biological function. In this context, homology-based modeling emerges as a cost-effective and computationally efficient alternative. The MODELLER (version 10.5, accessed on 30 November 2023) algorithm can be harnessed for this purpose since it computes intermediate models during simulated annealing, enabling the exploration of attainable configurational states and energies while minimizing its objective function. There have been a few attempts to date to improve the models generated by its algorithm, and in particular, there is no literature regarding the implementation of an averaging procedure involving the intermediate models in the MODELLER algorithm. In this study, we examined MODELLER's output using 225 target-template pairs, extracting the best representatives of intermediate models. Applying an averaging procedure to the selected intermediate structures based on statistical potentials, we aimed to determine: (1) whether averaging improves the quality of structural models during the building phase; (2) if ranking by statistical potentials reliably selects the best models, leading to improved final model quality; (3) whether using a single template versus multiple templates affects the averaging approach; (4) whether the "ensemble" nature of the MODELLER building phase can be harnessed to capture low-energy conformations in holo structures modeling. Our findings indicate that while improvements typically fall short of a few decimal points in the model evaluation metric, a notable fraction of configurations exhibit slightly higher similarity to the native structure than MODELLER's proposed final model. The averaging-building procedure proves particularly beneficial in (1) regions of low sequence identity between the target and template(s), the most challenging aspect of homology modeling; (2) holo protein conformations generation, an area in which MODELLER and related tools usually fall short of the expected performance.

摘要

近年来，基于 AlphaFold 2 和机器学习的蛋白质结构预测技术取得了显著进展，其在静态结构预测方面表现出色，但在捕捉对理解生物功能至关重要的动态特征方面仍面临挑战。在这种情况下，同源建模作为一种具有成本效益和计算效率的替代方法脱颖而出。可以利用 MODELLER（版本 10.5，于 2023 年 11 月 30 日访问）算法实现这一目标，因为它在模拟退火过程中计算中间模型，从而可以探索可达的构象状态和能量，同时最小化其目标函数。迄今为止，已经有一些尝试来改进其算法生成的模型，特别是，关于在 MODELLER 算法中实施涉及中间模型的平均程序的文献还很少。在这项研究中，我们使用了 225 个目标-模板对来检查 MODELLER 的输出，从中提取中间模型的最佳代表。基于统计势能，我们对选定的中间结构应用平均程序，旨在确定：（1）平均程序是否在构建阶段提高结构模型的质量；（2）基于统计势能的排序是否可靠地选择最佳模型，从而提高最终模型质量；（3）使用单个模板与多个模板是否会影响平均方法；（4）是否可以利用 MODELLER 构建阶段的“集合”性质来捕获全结构建模中的低能构象。我们的研究结果表明，虽然改进通常在模型评估指标上相差几个小数位，但有相当一部分构象与天然结构的相似度略高于 MODELLER 提出的最终模型。平均构建程序在以下方面特别有益：（1）目标和模板之间的序列同一性较低的区域，这是同源建模最具挑战性的方面；（2）全蛋白构象生成，这是 MODELLER 和相关工具通常达不到预期性能的领域。