Department of Computer and Information Sciences, University of Delaware, Smith Hall, 18 Amstel Avenue, Newark, DE 19716, USA.
School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, 516 Jun Gong Road, Shanghai 200093, China.
Int J Mol Sci. 2024 May 11;25(10):5247. doi: 10.3390/ijms25105247.
Residue contact maps provide a condensed two-dimensional representation of three-dimensional protein structures, serving as a foundational framework in structural modeling but also as an effective tool in their own right in identifying inter-helical binding sites and drawing insights about protein function. Treating contact maps primarily as an intermediate step for 3D structure prediction, contact prediction methods have limited themselves exclusively to sequential features. Now that AlphaFold2 predicts 3D structures with good accuracy in general, we examine (1) how well predicted 3D structures can be directly used for deciding residue contacts, and (2) whether features from 3D structures can be leveraged to further improve residue contact prediction. With a well-known benchmark dataset, we tested predicting inter-helical residue contact based on AlphaFold2's predicted structures, which gave an 83% average precision, already outperforming a sequential features-based state-of-the-art model. We then developed a procedure to extract features from atomic structure in the neighborhood of a residue pair, hypothesizing that these features will be useful in determining if the residue pair is in contact, provided the structure is decently accurate, such as predicted by AlphaFold2. Training on features generated from experimentally determined structures, we leveraged knowledge from known structures to significantly improve residue contact prediction, when testing using the same set of features but derived using AlphaFold2 structures. Our results demonstrate a remarkable improvement over AlphaFold2, achieving over 91.9% average precision for a held-out subset and over 89.5% average precision in cross-validation experiments.
残基接触图提供了三维蛋白质结构的二维浓缩表示,不仅是结构建模的基础框架,而且本身也是识别螺旋间结合位点和深入了解蛋白质功能的有效工具。接触图预测方法主要将接触图作为 3D 结构预测的中间步骤,其仅将自身限制在序列特征上。既然 AlphaFold2 通常可以准确地预测 3D 结构,我们就研究了(1)预测的 3D 结构在多大程度上可以直接用于确定残基接触,以及(2)3D 结构中的特征是否可以被利用来进一步改进残基接触预测。我们使用一个著名的基准数据集,测试了基于 AlphaFold2 预测结构的螺旋间残基接触预测,其平均精度为 83%,已经超过了基于序列特征的最先进模型。然后,我们开发了一种从残基对附近的原子结构中提取特征的过程,假设在结构相当准确的情况下(例如由 AlphaFold2 预测),这些特征将有助于确定残基对是否接触。我们使用从实验确定的结构中生成的特征进行训练,利用已知结构中的知识,在使用相同的特征集进行测试时,显著提高了残基接触预测的准确性,但这些特征是由 AlphaFold2 结构生成的。我们的结果表明,与 AlphaFold2 相比有了显著的改进,在预留子集上的平均精度超过 91.9%,在交叉验证实验中的平均精度超过 89.5%。