IDP-Bert：使用大语言模型预测内在无序蛋白质的特性。

IDP-Bert: Predicting Properties of Intrinsically Disordered Proteins Using Large Language Models.

作者信息

Mollaei Parisa, Sadasivam Danush, Guntuboina Chakradhar, Barati Farimani Amir

机构信息

Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.

Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.

出版信息

J Phys Chem B. 2024 Dec 12;128(49):12030-12037. doi: 10.1021/acs.jpcb.4c02507. Epub 2024 Nov 25.

DOI:10.1021/acs.jpcb.4c02507

PMID:39586094

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11647883/

Abstract

Intrinsically disordered Proteins (IDPs) constitute a large and structureless class of proteins with significant functions. The existence of IDPs challenges the conventional notion that the biological functions of proteins rely on their three-dimensional structures. Despite lacking well-defined spatial arrangements, they exhibit diverse biological functions, influencing cellular processes and shedding light on disease mechanisms. However, it is expensive to run experiments or simulations to characterize this class of proteins. Consequently, we designed an ML model that relies solely on amino acid sequences. In this study, we introduce the IDP-Bert model, a deep-learning architecture leveraging Transformers and Protein Language Models to map sequences directly to IDP properties. Our experiments demonstrate accurate predictions of IDP properties, including Radius of Gyration, end-to-end Decorrelation Time, and Heat Capacity.

摘要

内在无序蛋白质（IDP）构成了一大类无结构但具有重要功能的蛋白质。IDP的存在挑战了传统观念，即蛋白质的生物学功能依赖于其三维结构。尽管缺乏明确的空间排列，但它们展现出多样的生物学功能，影响细胞过程并为疾病机制提供线索。然而，通过实验或模拟来表征这类蛋白质成本高昂。因此，我们设计了一个仅依赖氨基酸序列的机器学习模型。在本研究中，我们引入了IDP-Bert模型，这是一种利用Transformer和蛋白质语言模型将序列直接映射到IDP属性的深度学习架构。我们的实验证明了对IDP属性的准确预测，包括回转半径、端到端去相关时间和热容。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b48/11647883/6b5cd93b052d/jp4c02507_0001.jpg

相似文献

IDP-Bert: Predicting Properties of Intrinsically Disordered Proteins Using Large Language Models.IDP-Bert：使用大语言模型预测内在无序蛋白质的特性。

J Phys Chem B. 2024 Dec 12;128(49):12030-12037. doi: 10.1021/acs.jpcb.4c02507. Epub 2024 Nov 25.

IDP-LM: Prediction of protein intrinsic disorder and disorder functions based on language models.IDP-LM：基于语言模型的蛋白质固有无序预测和无序功能预测。

PLoS Comput Biol. 2023 Nov 22;19(11):e1011657. doi: 10.1371/journal.pcbi.1011657. eCollection 2023 Nov.

Effects of Sequence Composition, Patterning and Hydrodynamics on the Conformation and Dynamics of Intrinsically Disordered Proteins.序列组成、模式和流体动力学对无序蛋白质构象和动力学的影响。

Int J Mol Sci. 2023 Jan 11;24(2):1444. doi: 10.3390/ijms24021444.

Accelerated Missense Mutation Identification in Intrinsically Disordered Proteins Using Deep Learning.利用深度学习加速内在无序蛋白质中的错义突变鉴定

Biomacromolecules. 2025 Apr 14;26(4):2106-2115. doi: 10.1021/acs.biomac.4c01124. Epub 2025 Mar 12.

Accurate and Fast Prediction of Intrinsically Disordered Protein by Multiple Protein Language Models and Ensemble Learning.通过多种蛋白质语言模型和集成学习实现对固有无序蛋白质的准确快速预测。

J Chem Inf Model. 2024 Apr 8;64(7):2901-2911. doi: 10.1021/acs.jcim.3c01202. Epub 2023 Oct 26.

Sequence Effects on Size, Shape, and Structural Heterogeneity in Intrinsically Disordered Proteins.序列对无规卷曲蛋白质大小、形状和结构异质性的影响。

J Phys Chem B. 2019 Apr 25;123(16):3462-3474. doi: 10.1021/acs.jpcb.9b02575. Epub 2019 Apr 15.

CIDER: Resources to Analyze Sequence-Ensemble Relationships of Intrinsically Disordered Proteins.CIDER：用于分析内在无序蛋白质序列-集合关系的资源。

Biophys J. 2017 Jan 10;112(1):16-21. doi: 10.1016/j.bpj.2016.11.3200.

Predicting the Dynamic Interaction of Intrinsically Disordered Proteins.预测无规卷曲蛋白质的动态相互作用。

J Chem Inf Model. 2024 Sep 9;64(17):6768-6777. doi: 10.1021/acs.jcim.4c00930. Epub 2024 Aug 20.

Sizes, conformational fluctuations, and SAXS profiles for intrinsically disordered proteins.内在无序蛋白质的尺寸、构象波动和小角X射线散射图谱。

Protein Sci. 2025 Apr;34(4):e70067. doi: 10.1002/pro.70067.

Predicting the sequence-dependent backbone dynamics of intrinsically disordered proteins.预测无规卷曲蛋白质序列相关的结构动态。

Elife. 2024 Oct 30;12:RP88958. doi: 10.7554/eLife.88958.

引用本文的文献

Protein Structure-Function Relationship: A Kernel-PCA Approach for Reaction Coordinate Identification.蛋白质结构-功能关系：一种用于反应坐标识别的核主成分分析方法。

J Chem Theory Comput. 2025 Jul 22;21(14):7122-7130. doi: 10.1021/acs.jctc.5c00483. Epub 2025 Jul 14.

Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties.多肽：利用多模态学习肽特性的语言图模型

J Chem Inf Model. 2025 Jan 13;65(1):83-91. doi: 10.1021/acs.jcim.4c01443. Epub 2024 Dec 19.

本文引用的文献

GPCR-BERT: Interpreting Sequential Design of G Protein-Coupled Receptors Using Protein Language Models.GPCR-BERT：使用蛋白质语言模型解释 G 蛋白偶联受体的序列设计。

J Chem Inf Model. 2024 Feb 26;64(4):1134-1144. doi: 10.1021/acs.jcim.3c01706. Epub 2024 Feb 10.

Direct prediction of intrinsically disordered protein conformational properties from sequence.从序列直接预测内在无序蛋白质的构象性质。

Nat Methods. 2024 Mar;21(3):465-476. doi: 10.1038/s41592-023-02159-5. Epub 2024 Jan 31.

Conformational ensembles of the human intrinsically disordered proteome.人类内在无序蛋白质组的构象集合

Nature. 2024 Feb;626(8000):897-904. doi: 10.1038/s41586-023-07004-5. Epub 2024 Jan 31.

Dynamics and interactions of intrinsically disordered proteins.内在无序蛋白质的动力学与相互作用

Curr Opin Struct Biol. 2024 Feb;84:102734. doi: 10.1016/j.sbi.2023.102734. Epub 2023 Nov 30.

PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction.PeptideBERT：一种基于 Transformer 的用于预测肽性质的语言模型。

J Phys Chem Lett. 2023 Nov 23;14(46):10427-10434. doi: 10.1021/acs.jpclett.3c02398. Epub 2023 Nov 13.

Unveiling Switching Function of Amino Acids in Proteins Using a Machine Learning Approach.利用机器学习方法揭示蛋白质中氨基酸的切换功能。

J Chem Theory Comput. 2023 Nov 28;19(22):8472-8480. doi: 10.1021/acs.jctc.3c00665. Epub 2023 Nov 6.

Large language models in medicine.医学中的大型语言模型。

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

Activity Map and Transition Pathways of G Protein-Coupled Receptor Revealed by Machine Learning.机器学习揭示的 G 蛋白偶联受体的活动图谱和跃迁途径。

J Chem Inf Model. 2023 Apr 24;63(8):2296-2304. doi: 10.1021/acs.jcim.3c00032. Epub 2023 Apr 10.

Fuzzy Drug Targets: Disordered Proteins in the Drug-Discovery Realm.模糊的药物靶点：药物研发领域中的无序蛋白质

ACS Omega. 2023 Mar 8;8(11):9729-9747. doi: 10.1021/acsomega.2c07708. eCollection 2023 Mar 21.

Perspectives on evolutionary and functional importance of intrinsically disordered proteins.关于无规则卷曲蛋白质在进化和功能上的重要性的观点。

Int J Biol Macromol. 2023 Jan 1;224:243-255. doi: 10.1016/j.ijbiomac.2022.10.120. Epub 2022 Oct 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

IDP-Bert：使用大语言模型预测内在无序蛋白质的特性。

IDP-Bert: Predicting Properties of Intrinsically Disordered Proteins Using Large Language Models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献