Suppr超能文献

蛋白质p的准确快速预测:蛋白质语言模型揭示序列与p的关系

Accurate and Rapid Prediction of Protein p: Protein Language Models Reveal the Sequence-p Relationship.

作者信息

Xu Shijie, Onoda Akira

机构信息

Graduate School of Environmental Science, Hokkaido University, Sapporo 060-0810 Japan.

Faculty of Environmental Earth Science, Hokkaido University, Sapporo 060-0810, Japan.

出版信息

J Chem Theory Comput. 2025 Apr 8;21(7):3752-3764. doi: 10.1021/acs.jctc.4c01288. Epub 2025 Mar 26.

Abstract

Protein p prediction is a key challenge in computational biology. In this study, we present pKALM, a novel deep learning-based method for high-throughput protein p prediction. pKALM uses a protein language model (PLM) to capture the complex sequence-structure relationships of proteins. While traditionally considered a structure-based problem, our results show that a PLM pretrained on large-scale protein sequence databases can effectively learn this relationship and achieve state-of-the-art performance. pKALM accurately predicts the p values of six residues (Asp, Glu, His, Lys, Cys, and Tyr) and two termini with high precision and efficiency. It performs well at predicting both exposed and buried residues, which often deviate from standard p values measured in the solvent. We demonstrate a novel finding that predicted protein isoelectric points (pI) can be used to improve the accuracy of p prediction. High-throughput p prediction of the human proteome using pKALM achieves a speed of 4,965 p predictions per second, which is several orders of magnitude faster than existing state-of-the-art methods. The case studies illustrate the efficacy of pKALM in estimating p values and the constraints of the method. pKALM will thus be a valuable tool for researchers in the fields of biochemistry, biophysics, and drug design.

摘要

蛋白质p预测是计算生物学中的一项关键挑战。在本研究中,我们提出了pKALM,一种基于深度学习的新型高通量蛋白质p预测方法。pKALM使用蛋白质语言模型(PLM)来捕捉蛋白质复杂的序列-结构关系。虽然传统上认为这是一个基于结构的问题,但我们的结果表明,在大规模蛋白质序列数据库上预训练的PLM可以有效地学习这种关系并实现最优性能。pKALM能够高精度、高效率地准确预测六个残基(天冬氨酸、谷氨酸、组氨酸、赖氨酸、半胱氨酸和酪氨酸)以及两个末端的p值。它在预测暴露和埋藏的残基方面都表现出色,这些残基的p值通常与在溶剂中测量的标准p值不同。我们展示了一个新发现,即预测的蛋白质等电点(pI)可用于提高p预测的准确性。使用pKALM对人类蛋白质组进行高通量p预测,速度达到每秒4965个p预测,比现有的最优方法快几个数量级。案例研究说明了pKALM在估计p值方面的有效性以及该方法的局限性。因此,pKALM将成为生物化学、生物物理学和药物设计领域研究人员的一个有价值的工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验