深度学习在配体对接中的应用:虚拟筛选的挑战与展望。

Advancing Ligand Docking through Deep Learning: Challenges and Prospects in Virtual Screening.

机构信息

College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.

Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China.

出版信息

Acc Chem Res. 2024 May 21;57(10):1500-1509. doi: 10.1021/acs.accounts.4c00093. Epub 2024 Apr 5.

Abstract

Molecular docking, also termed ligand docking (LD), is a pivotal element of structure-based virtual screening (SBVS) used to predict the binding conformations and affinities of protein-ligand complexes. Traditional LD methodologies rely on a search and scoring framework, utilizing heuristic algorithms to explore binding conformations and scoring functions to evaluate binding strengths. However, to meet the efficiency demands of SBVS, these algorithms and functions are often simplified, prioritizing speed over accuracy.The emergence of deep learning (DL) has exerted a profound impact on diverse fields, ranging from natural language processing to computer vision and drug discovery. DeepMind's AlphaFold2 has impressively exhibited its ability to accurately predict protein structures solely from amino acid sequences, highlighting the remarkable potential of DL in conformation prediction. This groundbreaking advancement circumvents the traditional search-scoring frameworks in LD, enhancing both accuracy and processing speed and thereby catalyzing a broader adoption of DL algorithms in binding pose prediction. Nevertheless, a consensus on certain aspects remains elusive.In this Account, we delineate the current status of employing DL to augment LD within the VS paradigm, highlighting our contributions to this domain. Furthermore, we discuss the challenges and future prospects, drawing insights from our scholarly investigations. Initially, we present an overview of VS and LD, followed by an introduction to DL paradigms, which deviate significantly from traditional search-scoring frameworks. Subsequently, we delve into the challenges associated with the development of DL-based LD (DLLD), encompassing evaluation metrics, application scenarios, and physical plausibility of the predicted conformations. In the evaluation of LD algorithms, it is essential to recognize the multifaceted nature of the metrics. While the accuracy of binding pose prediction, often measured by the success rate, is a pivotal aspect, the scoring/screening power and computational speed of these algorithms are equally important given the pivotal role of LD tools in VS. Regarding application scenarios, early methods focused on blind docking, where the binding site is unknown. However, recent studies suggest a shift toward identifying binding sites rather than solely predicting binding poses within these models. In contrast, LD with a known pocket in VS has been shown to be more practical. Physical plausibility poses another significant challenge. Although DLLD models often achieve higher success rates compared to traditional methods, they may generate poses with implausible local structures, such as incorrect bond angles or lengths, which are disadvantageous for postprocessing tasks like visualization. Finally, we discuss the future perspectives for DLLD, emphasizing the need to improve generalization ability, strike a balance between speed and accuracy, account for protein conformation flexibility, and enhance physical plausibility. Additionally, we delve into the comparison between generative and regression algorithms in this context, exploring their respective strengths and potential.

摘要

分子对接,也称为配体对接(LD),是基于结构的虚拟筛选(SBVS)的一个关键组成部分,用于预测蛋白质-配体复合物的结合构象和亲和力。传统的 LD 方法依赖于搜索和评分框架,使用启发式算法来探索结合构象,使用评分函数来评估结合强度。然而,为了满足 SBVS 的效率要求,这些算法和函数通常被简化,优先考虑速度而不是准确性。深度学习(DL)的出现对自然语言处理、计算机视觉和药物发现等多个领域产生了深远的影响。DeepMind 的 AlphaFold2 令人印象深刻地展示了仅从氨基酸序列准确预测蛋白质结构的能力,突出了 DL 在构象预测方面的巨大潜力。这一开创性的进展绕过了 LD 中的传统搜索-评分框架,提高了准确性和处理速度,从而促进了 DL 算法在结合构象预测中的更广泛应用。然而,在某些方面仍然缺乏共识。在本账户中,我们描述了在 VS 范式中使用 DL 增强 LD 的现状,强调了我们在这一领域的贡献。此外,我们还讨论了挑战和未来展望,从我们的学术研究中汲取了灵感。首先,我们介绍了 VS 和 LD 的概述,然后介绍了与传统搜索-评分框架有很大不同的 DL 范式。随后,我们深入探讨了基于 DL 的 LD(DLLD)发展所面临的挑战,包括评估指标、应用场景和预测构象的物理合理性。在 LD 算法的评估中,必须认识到指标的多面性。虽然结合构象预测的准确性,通常通过成功率来衡量,是一个关键方面,但这些算法的评分/筛选能力和计算速度同样重要,因为 LD 工具在 VS 中起着关键作用。关于应用场景,早期的方法侧重于盲对接,即不知道结合位点。然而,最近的研究表明,这些模型中的一个趋势是从识别结合位点转变为不仅仅预测结合构象。相比之下,VS 中具有已知口袋的 LD 已被证明更具实用性。物理合理性是另一个重大挑战。尽管 DLLD 模型通常比传统方法获得更高的成功率,但它们可能会生成具有不合理局部结构的构象,例如不正确的键角或长度,这不利于后处理任务,如可视化。最后,我们讨论了 DLLD 的未来展望,强调需要提高泛化能力、在速度和准确性之间取得平衡、考虑蛋白质构象的灵活性,并提高物理合理性。此外,我们还探讨了在这种情况下生成算法和回归算法之间的比较,探索了它们各自的优势和潜力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索