Huang Junxiong, Li Weikang, Xiao Bin, Zhao Chunqing, Zheng Hancheng, Li Yingrui, Wang Jun
iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China.
iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China.
iScience. 2024 Aug 30;27(10):110850. doi: 10.1016/j.isci.2024.110850. eCollection 2024 Oct 18.
The protein-peptide interaction plays a pivotal role in fields such as drug development, yet remains underexplored experimentally and challenging to model computationally. Herein, we introduce PepCA, a sequence-based approach for predicting peptide-binding sites on proteins. A primary obstacle in predicting peptide-protein interactions is the difficulty in acquiring precise protein structures, coupled with the uncertainty of polypeptide configurations. To address this, we first encode protein sequences using the Evolutionary Scale Modeling 2 (ESM-2) pre-trained model to extract latent structural information. Additionally, we have developed a multi-input coattention mechanism to concurrently update the encoding of both peptide and protein residues. PepCA integrates this module within an encoder-decoder structure. This model's high precision in identifying binding sites significantly advances the field of computational biology, offering vital insights for peptide drug development and protein science.
蛋白质-肽相互作用在药物开发等领域起着关键作用,但在实验上仍未得到充分探索,并且在计算建模方面具有挑战性。在此,我们介绍PepCA,一种基于序列的预测蛋白质上肽结合位点的方法。预测肽-蛋白质相互作用的一个主要障碍是难以获得精确的蛋白质结构,以及多肽构象的不确定性。为了解决这个问题,我们首先使用进化尺度建模2(ESM-2)预训练模型对蛋白质序列进行编码,以提取潜在的结构信息。此外,我们开发了一种多输入共注意力机制,以同时更新肽和蛋白质残基的编码。PepCA将该模块集成在编码器-解码器结构中。该模型在识别结合位点方面的高精度显著推动了计算生物学领域的发展,为肽药物开发和蛋白质科学提供了重要见解。