State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad429.
The biological function of proteins is determined not only by their static structures but also by the dynamic properties of their conformational ensembles. Numerous high-accuracy static structure prediction tools have been recently developed based on deep learning; however, there remains a lack of efficient and accurate methods for exploring protein dynamic conformations. Traditionally, studies concerning protein dynamics have relied on molecular dynamics (MD) simulations, which incur significant computational costs for all-atom precision and struggle to adequately sample conformational spaces with high energy barriers. To overcome these limitations, various enhanced sampling techniques have been developed to accelerate sampling in MD. Traditional enhanced sampling approaches like replica exchange molecular dynamics (REMD) and frontier expansion sampling (FEXS) often follow the MD simulation approach and still cost a lot of computational resources and time. Variational autoencoders (VAEs), as a classic deep generative model, are not restricted by potential energy landscapes and can explore conformational spaces more efficiently than traditional methods. However, VAEs often face challenges in generating reasonable conformations for complex proteins, especially intrinsically disordered proteins (IDPs), which limits their application as an enhanced sampling method. In this study, we presented a novel deep learning model (named Phanto-IDP) that utilizes a graph-based encoder to extract protein features and a transformer-based decoder combined with variational sampling to generate highly accurate protein backbones. Ten IDPs and four structured proteins were used to evaluate the sampling ability of Phanto-IDP. The results demonstrate that Phanto-IDP has high fidelity and diversity in the generated conformation ensembles, making it a suitable tool for enhancing the efficiency of MD simulation, generating broader protein conformational space and a continuous protein transition path.
蛋白质的生物功能不仅取决于其静态结构,还取决于其构象集合的动态特性。最近,基于深度学习的高精度静态结构预测工具已经得到了发展;然而,对于探索蛋白质动态构象的高效准确方法仍然缺乏。传统上,蛋白质动力学的研究依赖于分子动力学 (MD) 模拟,该方法需要全原子精度的计算成本,并且难以充分采样具有高能量势垒的构象空间。为了克服这些限制,已经开发了各种增强采样技术来加速 MD 中的采样。传统的增强采样方法,如 replica exchange molecular dynamics (REMD) 和 frontier expansion sampling (FEXS),通常遵循 MD 模拟方法,仍然需要大量的计算资源和时间。变分自编码器 (VAEs) 作为一种经典的深度生成模型,不受势能面的限制,可以比传统方法更有效地探索构象空间。然而,VAEs 在为复杂蛋白质,特别是无序蛋白质 (IDPs) 生成合理构象方面经常面临挑战,这限制了它们作为增强采样方法的应用。在这项研究中,我们提出了一种新的深度学习模型 (名为 Phanto-IDP),它利用基于图的编码器提取蛋白质特征和基于转换器的解码器结合变分采样生成高度准确的蛋白质骨架。我们使用十个 IDP 和四个结构蛋白来评估 Phanto-IDP 的采样能力。结果表明,Phanto-IDP 在生成的构象集合中具有高保真度和多样性,使其成为增强 MD 模拟效率、生成更广泛的蛋白质构象空间和连续蛋白质转变路径的合适工具。