Sun Yawen, Wang Rui, Luo Zeyu, Tan Lejia, Liu Junhao, Li Ruimeng, Wei Dongqing, Zhang Yu-Juan
College of Life Science, Chongqing Normal University, No. 37 University Town Road, high-tech District, Chongqing 401331, P.R. China.
State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan RD. Minhang District, Shanghai 200030, P.R. China.
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf434.
The prediction of binary protein-protein interactions (PPIs) is essential for protein engineering, but a major challenge in deep learning-based methods is the unknown decision-making process of the model. To address this challenge, we propose the ESM2_AMP framework, which utilizes the ESM2 protein language model for extracting segment features from actual amino acid sequences and integrates the Transformer model for feature fusion in binary PPIs prediction. Further, the two distinct models, ESM2_AMPS and ESM2_AMP_CSE are developed to systematically explore the contributions of segment features and combine with special tokens features in the decision-making process. The experimental results reveal that the model relying on segment features demonstrates strong correlations between segments with high attention weights and known functional regions of amino acid sequences. This insight suggests that attention to these segments helps capture biologically relevant functional and interaction-related information. By analyzing the coverage relationship between high-attention sequence fragments and functional regions, we validated the model's ability to capture key segment features of PPIs and revealed the critical role of functional domains in PPIs. This finding not only enhances the interpretability methods for sequence-based prediction models but also provides biological evidence supporting the important regulatory role of functional sequences in protein-protein interactions. It offers cross-disciplinary insights for algorithm optimization and experimental validation research in the field of computational biology.
二元蛋白质-蛋白质相互作用(PPI)的预测对于蛋白质工程至关重要,但基于深度学习的方法面临的一个主要挑战是模型未知的决策过程。为应对这一挑战,我们提出了ESM2_AMP框架,该框架利用ESM2蛋白质语言模型从实际氨基酸序列中提取片段特征,并集成Transformer模型用于二元PPI预测中的特征融合。此外,还开发了两个不同的模型,即ESM2_AMPS和ESM2_AMP_CSE,以系统地探索片段特征的贡献,并在决策过程中结合特殊令牌特征。实验结果表明,依赖片段特征的模型在具有高注意力权重的片段与氨基酸序列的已知功能区域之间表现出很强的相关性。这一见解表明,关注这些片段有助于捕获生物学上相关的功能和相互作用相关信息。通过分析高注意力序列片段与功能区域之间的覆盖关系,我们验证了模型捕获PPI关键片段特征的能力,并揭示了功能域在PPI中的关键作用。这一发现不仅增强了基于序列的预测模型的可解释性方法,还提供了生物学证据,支持功能序列在蛋白质-蛋白质相互作用中的重要调节作用。它为计算生物学领域的算法优化和实验验证研究提供了跨学科的见解。