空间卷积通过预训练的蛋白质语言模型和可解释的生物空间卷积实现对蛋白质结合位点的准确预测。

SpatConv Enables the Accurate Prediction of Protein Binding Sites by a Pretrained Protein Language Model and an Interpretable Bio-spatial Convolution.

作者信息

Guan Mingming, Han Jiyun, Zhang Shizhuo, Zheng Hongyu, Liu Juntao

机构信息

School of Mathematics and Statistics, Shandong University, Weihai 264209, China.

Department of Radiation Oncology, Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan 250012, China.

出版信息

Research (Wash D C). 2025 Jul 8;8:0773. doi: 10.34133/research.0773. eCollection 2025.

DOI:10.34133/research.0773

PMID:40636133

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12237623/

Abstract

Protein interactions with molecules, such as other proteins, peptides, or small ligands, play a critical role in biological processes, and the identification of protein binding sites is crucial for understanding the mechanisms underlying diseases such as cancer. Traditional protein binding site prediction models usually extract residue features manually and then employ a graph or point-cloud-based architecture borrowed from other fields. Therefore, substantial information loss and limited learning ability cause them to fail to capture residue binding patterns. To solve these challenges, we introduce a general network that predicts the binding residues of proteins, peptides, and metal ions on proteins. SpatConv extracts sequence features from a pretrained large protein language model and structure features from a local coordinate framework. SpatConv learns residue binding patterns through a specially designed, graph-free bio-spatial convolution, which characterizes the complex spatial environments around the residues. After training and testing, SpatConv demonstrates great improvements over the state-of-the-art predictors and reveals novel biological insights into the relationship between binding sites and physicochemical properties. Notably, SpatConv exhibits robust performance across predicted and experimental structures, enhancing its reliability. Additionally, when applying it to the spike protein structure of severe acute respiratory syndrome coronavirus 2, SpatConv successfully identifies antibody binding sites and predicts potential binding regions, providing strong evidence supporting new drug development. A user-friendly online server for SpatConv is freely available at http://liulab.top/SpatConv/server.

摘要

蛋白质与其他分子（如其他蛋白质、肽或小分子配体）的相互作用在生物过程中起着关键作用，而识别蛋白质结合位点对于理解诸如癌症等疾病的潜在机制至关重要。传统的蛋白质结合位点预测模型通常手动提取残基特征，然后采用从其他领域借鉴的基于图或点云的架构。因此，大量的信息损失和有限的学习能力导致它们无法捕捉残基结合模式。为了解决这些挑战，我们引入了一种通用网络，用于预测蛋白质、肽和蛋白质上金属离子的结合残基。SpatConv从预训练的大型蛋白质语言模型中提取序列特征，并从局部坐标框架中提取结构特征。SpatConv通过专门设计的无图生物空间卷积来学习残基结合模式，该卷积表征了残基周围复杂的空间环境。经过训练和测试，SpatConv比现有最先进的预测器有了很大改进，并揭示了结合位点与物理化学性质之间关系的新生物学见解。值得注意的是，SpatConv在预测结构和实验结构上均表现出强大的性能，提高了其可靠性。此外，当将其应用于严重急性呼吸综合征冠状病毒2的刺突蛋白结构时，SpatConv成功识别出抗体结合位点并预测了潜在的结合区域，为新药开发提供了有力证据。可通过http://liulab.top/SpatConv/server免费获得一个用户友好的SpatConv在线服务器。