Department of Biochemistry, University of Washington, Seattle, Washington 98195, United States.
Institute for Protein Design, University of Washington, Seattle, Washington 98195, United States.
J Chem Theory Comput. 2024 Apr 9;20(7):2689-2695. doi: 10.1021/acs.jctc.3c01057. Epub 2024 Mar 28.
Mapping the ensemble of protein conformations that contribute to function and can be targeted by small molecule drugs remains an outstanding challenge. Here, we explore the use of variational autoencoders for reducing the challenge of dimensionality in the protein structure ensemble generation problem. We convert high-dimensional protein structural data into a continuous, low-dimensional representation, carry out a search in this space guided by a structure quality metric, and then use RoseTTAFold guided by the sampled structural information to generate 3D structures. We use this approach to generate ensembles for the cancer relevant protein K-Ras, train the VAE on a subset of the available K-Ras crystal structures and MD simulation snapshots, and assess the extent of sampling close to crystal structures withheld from training. We find that our latent space sampling procedure rapidly generates ensembles with high structural quality and is able to sample within 1 Å of held-out crystal structures, with a consistency higher than that of MD simulation or AlphaFold2 prediction. The sampled structures sufficiently recapitulate the cryptic pockets in the held-out K-Ras structures to allow for small molecule docking.
映射有助于功能的蛋白质构象组合,并可以被小分子药物靶向,这仍然是一个突出的挑战。在这里,我们探索使用变分自动编码器来降低蛋白质结构组合生成问题的维度挑战。我们将高维蛋白质结构数据转换为连续的低维表示,在结构质量度量的指导下在这个空间中进行搜索,然后使用采样的结构信息引导的 RoseTTAFold 生成 3D 结构。我们使用这种方法为癌症相关蛋白 K-Ras 生成组合,在可用 K-Ras 晶体结构和 MD 模拟快照的子集上训练 VAE,并评估接近未在训练中保留的晶体结构的采样程度。我们发现,我们的潜在空间采样过程能够快速生成具有高结构质量的组合,并能够在与保留的晶体结构相差 1 Å 的范围内进行采样,其一致性高于 MD 模拟或 AlphaFold2 预测。采样结构足以再现保留的 K-Ras 结构中的隐蔽口袋,从而允许小分子对接。