Graduate School of System Informatics, Kobe University, Kobe, Japan.
BMC Bioinformatics. 2023 Jun 5;24(1):233. doi: 10.1186/s12859-023-05354-5.
Three-dimensional structures of protein-ligand complexes provide valuable insights into their interactions and are crucial for molecular biological studies and drug design. However, their high-dimensional and multimodal nature hinders end-to-end modeling, and earlier approaches depend inherently on existing protein structures. To overcome these limitations and expand the range of complexes that can be accurately modeled, it is necessary to develop efficient end-to-end methods.
We introduce an equivariant diffusion-based generative model that learns the joint distribution of ligand and protein conformations conditioned on the molecular graph of a ligand and the sequence representation of a protein extracted from a pre-trained protein language model. Benchmark results show that this protein structure-free model is capable of generating diverse structures of protein-ligand complexes, including those with correct binding poses. Further analyses indicate that the proposed end-to-end approach is particularly effective when the ligand-bound protein structure is not available.
The present results demonstrate the effectiveness and generative capability of our end-to-end complex structure modeling framework with diffusion-based generative models. We suppose that this framework will lead to better modeling of protein-ligand complexes, and we expect further improvements and wide applications.
蛋白质-配体复合物的三维结构提供了对其相互作用的有价值的见解,对于分子生物学研究和药物设计至关重要。然而,它们的高维性和多模态性质阻碍了端到端建模,早期的方法本质上依赖于现有的蛋白质结构。为了克服这些限制并扩大可以准确建模的复合物范围,有必要开发高效的端到端方法。
我们介绍了一种基于等变扩散的生成模型,该模型学习配体和蛋白质构象的联合分布,条件是配体的分子图和从预训练的蛋白质语言模型中提取的蛋白质序列表示。基准结果表明,这种无蛋白质结构的模型能够生成蛋白质-配体复合物的多种结构,包括具有正确结合构象的结构。进一步的分析表明,当配体结合的蛋白质结构不可用时,所提出的端到端方法特别有效。
目前的结果表明,基于扩散的生成模型的端到端复合物结构建模框架具有有效性和生成能力。我们假设该框架将导致更好的蛋白质-配体复合物建模,我们期望进一步的改进和广泛的应用。