Mollaei Parisa, Sadasivam Danush, Guntuboina Chakradhar, Barati Farimani Amir
Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.
Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.
J Phys Chem B. 2024 Dec 12;128(49):12030-12037. doi: 10.1021/acs.jpcb.4c02507. Epub 2024 Nov 25.
Intrinsically disordered Proteins (IDPs) constitute a large and structureless class of proteins with significant functions. The existence of IDPs challenges the conventional notion that the biological functions of proteins rely on their three-dimensional structures. Despite lacking well-defined spatial arrangements, they exhibit diverse biological functions, influencing cellular processes and shedding light on disease mechanisms. However, it is expensive to run experiments or simulations to characterize this class of proteins. Consequently, we designed an ML model that relies solely on amino acid sequences. In this study, we introduce the IDP-Bert model, a deep-learning architecture leveraging Transformers and Protein Language Models to map sequences directly to IDP properties. Our experiments demonstrate accurate predictions of IDP properties, including Radius of Gyration, end-to-end Decorrelation Time, and Heat Capacity.
内在无序蛋白质(IDP)构成了一大类无结构但具有重要功能的蛋白质。IDP的存在挑战了传统观念,即蛋白质的生物学功能依赖于其三维结构。尽管缺乏明确的空间排列,但它们展现出多样的生物学功能,影响细胞过程并为疾病机制提供线索。然而,通过实验或模拟来表征这类蛋白质成本高昂。因此,我们设计了一个仅依赖氨基酸序列的机器学习模型。在本研究中,我们引入了IDP-Bert模型,这是一种利用Transformer和蛋白质语言模型将序列直接映射到IDP属性的深度学习架构。我们的实验证明了对IDP属性的准确预测,包括回转半径、端到端去相关时间和热容。