Hoegen Dijkhof Luuk R, Rönkkö Teemu K E, von Vegesack Hans C, Lenzing Jacob, Hauser Alexander S
Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100 Ø, Copenhagen, Denmark.
Center for Pharmaceutical Data Science, University of Copenhagen, Denmark.
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf186.
Deep learning (DL) methods have drastically advanced structure-based drug discovery by directly predicting protein structures from sequences. Recently, these methods have become increasingly accurate in predicting complexes formed by multiple protein chains. We evaluated these advancements to predict and accurately model the largest receptor family and its cognate peptide hormones. We benchmarked DL tools, including AlphaFold 2.3 (AF2), AlphaFold 3 (AF3), Chai-1, NeuralPLexer, RoseTTAFold-AllAtom, Peptriever, ESMFold, and D-SCRIPT, to predict interactions between G protein-coupled receptors (GPCRs) and their endogenous peptide ligands. Our results showed that structure-aware models outperformed language models in peptide binding classification, with the top-performing model achieving an area under the curve of 0.86 on a benchmark set of 124 ligands and 1240 decoys. Rescoring predicted structures on local interactions further improved the principal ligand discovery among decoy peptides, whereas DL-based approaches did not. We explored a competitive tournament approach for modeling multiple peptides simultaneously on a single GPCR, which accelerates the performance but reduces true-positive recovery. When evaluating the binding poses of 67 recent complexes, AF2 reproduced the correct binding modes in nearly all cases (94%), surpassing those of both AF3 and Chai-1. Confidence scores correlate with structural binding mode accuracy, which provides a guide for interpreting interface predictions. These results demonstrated that DL models can reliably rediscover peptide binders, aid peptide drug discovery, and guide the selection of optimal tools for GPCR-targeted therapies. To this end, we provided a practical guide for selecting the best models for specific applications and an independent benchmarking set for future model evaluation.
深度学习(DL)方法通过直接从序列预测蛋白质结构,极大地推动了基于结构的药物发现。最近,这些方法在预测由多条蛋白质链形成的复合物方面越来越准确。我们评估了这些进展,以预测并准确模拟最大的受体家族及其同源肽激素。我们对DL工具进行了基准测试,包括AlphaFold 2.3(AF2)、AlphaFold 3(AF3)、Chai-1、NeuralPLexer、RoseTTAFold-AllAtom、Peptriever、ESMFold和D-SCRIPT,以预测G蛋白偶联受体(GPCR)与其内源性肽配体之间的相互作用。我们的结果表明,在肽结合分类中,结构感知模型优于语言模型,表现最佳的模型在由124个配体和1240个诱饵组成的基准集上实现了0.86的曲线下面积。对局部相互作用的预测结构进行重新评分,进一步改善了诱饵肽中主要配体的发现,而基于DL的方法则没有。我们探索了一种竞争性竞赛方法,用于在单个GPCR上同时模拟多个肽,这提高了性能,但降低了真阳性回收率。在评估67个近期复合物的结合姿势时,AF2在几乎所有情况下(94%)都重现了正确的结合模式,超过了AF3和Chai-1。置信度分数与结构结合模式准确性相关,这为解释界面预测提供了指导。这些结果表明,DL模型可以可靠地重新发现肽结合剂,有助于肽药物发现,并指导针对GPCR的治疗方法的最佳工具选择。为此,我们提供了一份为特定应用选择最佳模型的实用指南,以及一个用于未来模型评估的独立基准集。