• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

内在无序蛋白质构象的可转移深度生成建模

Transferable deep generative modeling of intrinsically disordered protein conformations.

作者信息

Janson Giacomo, Feig Michael

机构信息

Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA.

出版信息

bioRxiv. 2024 Feb 8:2024.02.08.579522. doi: 10.1101/2024.02.08.579522.

DOI:10.1101/2024.02.08.579522
PMID:38370653
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10871340/
Abstract

Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.

摘要

内在无序蛋白质具有动态结构,借此发挥关键的生物学作用。阐明其构象集合是一个具有挑战性的问题,需要综合运用计算和实验方法。分子模拟是构建无序蛋白质结构集合的一种有价值的计算策略,但资源消耗极大。最近,基于从模拟数据中学习的深度生成模型的机器学习方法已成为生成结构集合的一种有效替代方法。然而,此类方法在对训练数据中不存在的序列和构象进行建模时,目前存在可转移性有限的问题。在此,我们开发了一种新型生成模型,该模型在内在无序蛋白质集合方面实现了高度的可转移性。这种方法名为idpSAM,是一种基于Transformer神经网络的潜在扩散模型。它结合了一个自动编码器来学习蛋白质几何结构的表示,并结合一个扩散模型在编码空间中采样新的构象。IdpSAM是在使用ABSINTH隐式溶剂模型对无序蛋白质区域进行模拟的大型数据集上进行训练的。由于其神经网络的表现力及其训练稳定性,IdpSAM能够忠实地捕捉测试序列的3D结构集合,而这些测试序列在训练集中没有相似性。我们的研究还展示了从采样有限的数据集中生成完整构象集合的潜力,并强调了训练集大小对于泛化的重要性。我们相信,IdpSAM代表了通过机器学习在可转移蛋白质集合建模方面的重大进展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f8/10871340/0d577d684fa4/nihpp-2024.02.08.579522v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f8/10871340/ef8aac1ac8ce/nihpp-2024.02.08.579522v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f8/10871340/d3edd1d7a72f/nihpp-2024.02.08.579522v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f8/10871340/32eae6d173e2/nihpp-2024.02.08.579522v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f8/10871340/635bf3e022f1/nihpp-2024.02.08.579522v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f8/10871340/083721ef6def/nihpp-2024.02.08.579522v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f8/10871340/0d577d684fa4/nihpp-2024.02.08.579522v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f8/10871340/ef8aac1ac8ce/nihpp-2024.02.08.579522v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f8/10871340/d3edd1d7a72f/nihpp-2024.02.08.579522v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f8/10871340/32eae6d173e2/nihpp-2024.02.08.579522v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f8/10871340/635bf3e022f1/nihpp-2024.02.08.579522v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f8/10871340/083721ef6def/nihpp-2024.02.08.579522v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f8/10871340/0d577d684fa4/nihpp-2024.02.08.579522v1-f0006.jpg

相似文献

1
Transferable deep generative modeling of intrinsically disordered protein conformations.内在无序蛋白质构象的可转移深度生成建模
bioRxiv. 2024 Feb 8:2024.02.08.579522. doi: 10.1101/2024.02.08.579522.
2
Transferable deep generative modeling of intrinsically disordered protein conformations.可转移的深度生成模型对固有无序蛋白质构象的建模。
PLoS Comput Biol. 2024 May 23;20(5):e1012144. doi: 10.1371/journal.pcbi.1012144. eCollection 2024 May.
3
Machine Learning Generation of Dynamic Protein Conformational Ensembles.机器学习生成动态蛋白质构象集合。
Molecules. 2023 May 12;28(10):4047. doi: 10.3390/molecules28104047.
4
Direct generation of protein conformational ensembles via machine learning.通过机器学习直接生成蛋白质构象集合。
Nat Commun. 2023 Feb 11;14(1):774. doi: 10.1038/s41467-023-36443-x.
5
Phanto-IDP: compact model for precise intrinsically disordered protein backbone generation and enhanced sampling.Phanto-IDP:用于精确生成无序蛋白质骨架和增强采样的紧凑模型。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad429.
6
Sampling Conformational Ensembles of Highly Dynamic Proteins via Generative Deep Learning.通过生成式深度学习对高动态蛋白质的构象集合进行采样
bioRxiv. 2024 Dec 9:2024.05.05.592587. doi: 10.1101/2024.05.05.592587.
7
Sampling Conformational Ensembles of Highly Dynamic Proteins via Generative Deep Learning.通过生成式深度学习对高动态蛋白质的构象集合进行采样
Res Sq. 2024 Jun 28:rs.3.rs-4301803. doi: 10.21203/rs.3.rs-4301803/v1.
8
Machine-learning-based methods to generate conformational ensembles of disordered proteins.基于机器学习的方法生成无序蛋白质的构象集合。
Biophys J. 2024 Jan 2;123(1):101-113. doi: 10.1016/j.bpj.2023.12.001. Epub 2023 Dec 5.
9
Artificial intelligence guided conformational mining of intrinsically disordered proteins.人工智能引导的无序蛋白质构象挖掘。
Commun Biol. 2022 Jun 20;5(1):610. doi: 10.1038/s42003-022-03562-y.
10
Enhancing Conformational Sampling for Intrinsically Disordered and Ordered Proteins by Variational Autoencoder.通过变分自编码器增强对固有无序和有序蛋白质的构象采样。
Int J Mol Sci. 2023 Apr 7;24(8):6896. doi: 10.3390/ijms24086896.

本文引用的文献

1
OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization.OpenFold:重新训练 AlphaFold2 可深入了解其学习机制和泛化能力。
Nat Methods. 2024 Aug;21(8):1514-1524. doi: 10.1038/s41592-024-02272-z. Epub 2024 May 14.
2
Enhanced Sampling with Machine Learning.利用机器学习的增强采样
Annu Rev Phys Chem. 2024 Jun;75(1):347-370. doi: 10.1146/annurev-physchem-083122-125941. Epub 2024 Jun 14.
3
One bead per residue can describe all-atom protein structures.一个珠位代表一个残基,可以描述全原子蛋白质结构。
Structure. 2024 Jan 4;32(1):97-111.e6. doi: 10.1016/j.str.2023.10.013. Epub 2023 Nov 23.
4
A new age in protein design empowered by deep learning.深度学习赋能的蛋白质设计新时代。
Cell Syst. 2023 Nov 15;14(11):925-939. doi: 10.1016/j.cels.2023.10.006.
5
DisProt in 2024: improving function annotation of intrinsically disordered proteins.2024 年的 DisProt:改善无序蛋白质的功能注释。
Nucleic Acids Res. 2024 Jan 5;52(D1):D434-D441. doi: 10.1093/nar/gkad928.
6
Machine learning coarse-grained potentials of protein thermodynamics.机器学习在蛋白质热力学中的粗粒度势能。
Nat Commun. 2023 Sep 15;14(1):5739. doi: 10.1038/s41467-023-41343-1.
7
Two for One: Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics.一举两得:粗粒度分子动力学的扩散模型与力场
J Chem Theory Comput. 2023 Sep 26;19(18):6151-6159. doi: 10.1021/acs.jctc.3c00702. Epub 2023 Sep 9.
8
The transformative power of transformers in protein structure prediction.变压器在蛋白质结构预测中的变革力量。
Proc Natl Acad Sci U S A. 2023 Aug 8;120(32):e2303499120. doi: 10.1073/pnas.2303499120. Epub 2023 Jul 31.
9
De novo design of protein structure and function with RFdiffusion.利用 RFdiffusion 从头设计蛋白质结构和功能。
Nature. 2023 Aug;620(7976):1089-1100. doi: 10.1038/s41586-023-06415-8. Epub 2023 Jul 11.
10
Machine Learning Generation of Dynamic Protein Conformational Ensembles.机器学习生成动态蛋白质构象集合。
Molecules. 2023 May 12;28(10):4047. doi: 10.3390/molecules28104047.