Suppr超能文献

THPLM:一种基于序列的深度学习框架,用于使用预先训练的蛋白质语言模型预测点变异后蛋白质稳定性的变化。

THPLM: a sequence-based deep learning framework for protein stability changes prediction upon point variations using pretrained protein language model.

机构信息

School of Information Science and Technology, Institution of Computational Biology, Northeast Normal University, Changchun 130117, China.

Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Changchun 130122, China.

出版信息

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad646.

Abstract

MOTIVATION

Quantitative determination of protein thermodynamic stability is a critical step in protein and drug design. Reliable prediction of protein stability changes caused by point variations contributes to developing-related fields. Over the past decades, dozens of structure-based and sequence-based methods have been proposed, showing good prediction performance. Despite the impressive progress, it is necessary to explore wild-type and variant protein representations to address the problem of how to represent the protein stability change in view of global sequence. With the development of structure prediction using learning-based methods, protein language models (PLMs) have shown accurate and high-quality predictions of protein structure. Because PLM captures the atomic-level structural information, it can help to understand how single-point variations cause functional changes.

RESULTS

Here, we proposed THPLM, a sequence-based deep learning model for stability change prediction using Meta's ESM-2. With ESM-2 and a simple convolutional neural network, THPLM achieved comparable or even better performance than most methods, including sequence-based and structure-based methods. Furthermore, the experimental results indicate that the PLM's ability to generate representations of sequence can effectively improve the ability of protein function prediction.

AVAILABILITY AND IMPLEMENTATION

The source code of THPLM and the testing data can be accessible through the following links: https://github.com/FPPGroup/THPLM.

摘要

动机

定量测定蛋白质热力学稳定性是蛋白质和药物设计的关键步骤。可靠预测由点变异引起的蛋白质稳定性变化有助于开发相关领域。在过去的几十年中,已经提出了数十种基于结构和基于序列的方法,这些方法表现出了良好的预测性能。尽管取得了令人瞩目的进展,但仍有必要探索野生型和变异型蛋白质的表示方法,以解决如何从全局序列的角度表示蛋白质稳定性变化的问题。随着基于学习的方法的结构预测的发展,蛋白质语言模型 (PLM) 已经展示了对蛋白质结构的准确和高质量预测。由于 PLM 捕获了原子级别的结构信息,因此它可以帮助理解单点变异如何导致功能变化。

结果

在这里,我们提出了 THPLM,这是一种基于序列的深度学习模型,用于使用 Meta 的 ESM-2 进行稳定性变化预测。通过 ESM-2 和一个简单的卷积神经网络,THPLM 实现了与大多数方法(包括基于序列和基于结构的方法)相当甚至更好的性能。此外,实验结果表明,PLM 生成序列表示的能力可以有效地提高蛋白质功能预测的能力。

可用性和实现

THPLM 的源代码和测试数据可以通过以下链接访问:https://github.com/FPPGroup/THPLM。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd4e/10627365/5a4d748ba430/btad646f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验