Long Yuxi, Donald Bruce R
Department of Computer Science, Department of Mathematics, Duke University, Durham, North Carolina, United States of America.
Department of Biochemistry, Department of Chemistry, Duke University and Duke University School of Medicine, Durham, North Carolina, United States of America.
PLoS Comput Biol. 2025 Jun 27;21(6):e1013216. doi: 10.1371/journal.pcbi.1013216. eCollection 2025 Jun.
Accurate binding affinity prediction (BAP) is crucial to structure-based drug design. We present PATH+, a novel, generalizable machine learning algorithm for BAP that exploits recent advances in computational topology. Compared to current binding affinity prediction algorithms, PATH+ shows similar or better accuracy and is more generalizable across orthogonal datasets. PATH+ is not only one of the most accurate algorithms for BAP, it is also the first algorithm that is inherently interpretable. Interpretability is a key factor of trust for an algorithm and alongside generalizability, which allows PATH+ to be trusted in critical applications, such as inhibitor design. We visualized the features captured by PATH+ for two clinically relevant protein-ligand complexes and find that PATH+ captures binding-relevant structural mutations that are corroborated by biochemical data. Our work also sheds light on the features captured by current computational topology BAP algorithms that contributed to their high performance, which have been poorly understood. PATH+ also offers an improvement of 𝒪 (m + n)3 in computational complexity and is empirically over 10 times faster than the dominant (uninterpretable) computational topology algorithm for BAP. Based on insights from PATH+, we built PATH-, a scoring function for differentiating between binders and non-binders that has outstanding accuracy against 11 current algorithms for BAP. In summary, we report progress in a novel combination of interpretability, speed, and accuracy that should further empower topological screening of large virtual inhibitor libraries to protein targets, and allow binding affinity predictions to be understood and trusted. The source code for PATH+ and PATH- is released open-source as part of the OSPREY protein design software package.
准确的结合亲和力预测(BAP)对于基于结构的药物设计至关重要。我们提出了PATH+,一种用于BAP的新颖且可推广的机器学习算法,该算法利用了计算拓扑学的最新进展。与当前的结合亲和力预测算法相比,PATH+具有相似或更高的准确性,并且在正交数据集上更具可推广性。PATH+不仅是最准确的BAP算法之一,也是首个具有内在可解释性的算法。可解释性是算法可信度的关键因素,与可推广性一起,使PATH+能够在关键应用中得到信任,例如抑制剂设计。我们可视化了PATH+针对两种临床相关蛋白质-配体复合物捕获的特征,发现PATH+捕获了与结合相关的结构突变,这些突变得到了生化数据的证实。我们的工作还揭示了当前计算拓扑BAP算法捕获的有助于其高性能的特征,而这些特征此前一直未得到充分理解。PATH+在计算复杂度上也有改进,为𝒪 (m + n)3,并且根据经验比用于BAP的主流(不可解释)计算拓扑算法快10倍以上。基于从PATH+获得的见解,我们构建了PATH-,一种用于区分结合剂和非结合剂的评分函数,与当前11种BAP算法相比具有出色的准确性。总之,我们报告了在可解释性、速度和准确性的新颖结合方面取得的进展,这应进一步增强对大型虚拟抑制剂库针对蛋白质靶点的拓扑筛选能力,并使结合亲和力预测能够被理解和信任。作为OSPREY蛋白质设计软件包的一部分,PATH+和PATH-的源代码已开源发布。