School of Physics, Peking University, Beijing 100871, China.
DP Technology, Beijing 100080, China.
J Chem Inf Model. 2024 Nov 25;64(22):8414-8426. doi: 10.1021/acs.jcim.4c00928. Epub 2024 Sep 28.
Accurate sampling of protein conformations is pivotal for advances in biology and medicine. Although there has been tremendous progress in protein structure prediction in recent years due to deep learning, models that can predict the different stable conformations of proteins with high accuracy and structural validity are still lacking. Here, we introduce UFConf, a cutting-edge approach designed for robust sampling of diverse protein conformations based solely on amino acid sequences. This method transforms AlphaFold2 into a diffusion model by implementing a conformation-based diffusion process and adapting the architecture to process diffused inputs effectively. To counteract the inherent conformational bias in the Protein Data Bank, we developed a novel hierarchical reweighting protocol based on structural clustering. Our evaluations demonstrate that UFConf outperforms existing methods in terms of successful sampling and structural validity. The comparisons with long-time molecular dynamics show that UFConf can overcome the energy barrier existing in molecular dynamics simulations and perform more efficient sampling. Furthermore, We showcase UFConf's utility in drug discovery through its application in neural protein-ligand docking. In a blind test, it accurately predicted a novel protein-ligand complex, underscoring its potential to impact real-world biological research. Additionally, we present other modes of sampling using UFConf, including partial sampling with fixed motif, Langevin dynamics, and structural interpolation.
准确采样蛋白质构象对于生物学和医学的发展至关重要。尽管近年来由于深度学习的发展,蛋白质结构预测取得了巨大进展,但仍然缺乏能够高精度和结构有效性预测蛋白质不同稳定构象的模型。在这里,我们引入了 UFConf,这是一种基于氨基酸序列的强大的蛋白质构象采样方法。该方法通过实施基于构象的扩散过程将 AlphaFold2 转化为扩散模型,并对架构进行调整以有效地处理扩散输入。为了克服蛋白质数据库中固有的构象偏差,我们开发了一种基于结构聚类的新的层次重新加权协议。我们的评估表明,UFConf 在成功采样和结构有效性方面优于现有方法。与长时间分子动力学的比较表明,UFConf 可以克服分子动力学模拟中存在的能量障碍,并进行更有效的采样。此外,我们通过在神经蛋白-配体对接中的应用展示了 UFConf 在药物发现中的实用性。在一项盲测中,它准确地预测了一种新的蛋白-配体复合物,突显了其对现实世界生物学研究的潜在影响。此外,我们还展示了使用 UFConf 的其他采样模式,包括带有固定基序的部分采样、朗之万动力学和结构插值。