Suppr超能文献

蛋白质预训练语言模型是否有助于预测蛋白质-配体相互作用?

Does protein pretrained language model facilitate the prediction of protein-ligand interaction?

机构信息

Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.

Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.

出版信息

Methods. 2023 Nov;219:8-15. doi: 10.1016/j.ymeth.2023.08.016. Epub 2023 Sep 9.

Abstract

Protein-ligand interaction (PLI) is a critical step for drug discovery. Recently, protein pretrained language models (PLMs) have showcased exceptional performance across a wide range of protein-related tasks. However, a significant heterogeneity exists between the PLM and PLI tasks, leading to a degree of uncertainty. In this study, we propose a method that quantitatively assesses the significance of protein PLMs in PLI prediction. Specifically, we analyze the performance of three widely-used protein PLMs (TAPE, ESM-1b, and ProtTrans) on three PLI tasks (PDBbind, Kinase, and DUD-E). The model with pre-training consistently achieves improved performance and decreased time cost, demonstrating that enhance both the accuracy and efficiency of PLI prediction. By quantitatively assessing the transferability, the optimal PLM for each PLI task is identified without the need for costly transfer experiments. Additionally, we examine the contributions of PLMs on the distribution of feature space, highlighting the improved discriminability after pre-training. Our findings provide insights into the mechanisms underlying PLMs in PLI prediction and pave the way for the design of more interpretable and accurate PLMs in the future. Code and data are freely available at https://github.com/brian-zZZ/PLM-PLI.

摘要

蛋白质-配体相互作用(PLI)是药物发现的关键步骤。最近,蛋白质预训练语言模型(PLM)在广泛的蛋白质相关任务中表现出了卓越的性能。然而,PLM 和 PLI 任务之间存在显著的异质性,导致一定程度的不确定性。在这项研究中,我们提出了一种方法,可以定量评估蛋白质 PLM 在 PLI 预测中的重要性。具体来说,我们分析了三种广泛使用的蛋白质 PLM(TAPE、ESM-1b 和 ProtTrans)在三个 PLI 任务(PDBbind、Kinase 和 DUD-E)上的性能。预训练后的模型始终能提高性能并降低时间成本,这表明预训练可以增强 PLI 预测的准确性和效率。通过定量评估可转移性,无需进行昂贵的迁移实验,就可以确定每个 PLI 任务的最佳 PLM。此外,我们还研究了 PLM 在特征空间分布上的贡献,突出了预训练后提高的可区分性。我们的研究结果为 PLI 预测中 PLM 的作用机制提供了深入的了解,并为未来设计更具可解释性和准确性的 PLM 铺平了道路。代码和数据可在 https://github.com/brian-zZZ/PLM-PLI 上免费获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验