Jiménez-Luna José, Pérez-Benito Laura, Martínez-Rosell Gerard, Sciabola Simone, Torella Rubben, Tresadern Gary, De Fabritiis Gianni
Computational Science Laboratory , Parc de Recerca Biomèdica de Barcelona , Universitat Pompeu Fabra , C Dr Aiguader 88 , Barcelona , 08003 , Spain . Email:
Laboratori de Medicina Computacional , Unitat de Bioestadística , Facultat de Medicina , Universitat Autònoma de Barcelona , Spain.
Chem Sci. 2019 Oct 16;10(47):10911-10918. doi: 10.1039/c9sc04606b. eCollection 2019 Dec 21.
The capability to rank different potential drug molecules against a protein target for potency has always been a fundamental challenge in computational chemistry due to its importance in drug design. While several simulation-based methodologies exist, they are hard to use prospectively and thus predicting potency in lead optimization campaigns remains an open challenge. Here we present the first machine learning approach specifically tailored for ranking congeneric series based on deep 3D-convolutional neural networks. Furthermore we prove its effectiveness by blindly testing it on datasets provided by Janssen, Pfizer and Biogen totalling over 3246 ligands and 13 targets as well as several well-known openly available sets, representing one the largest evaluations ever performed. We also performed online learning simulations of lead optimization using the approach in a predictive manner obtaining significant advantage over experimental choice. We believe that the evaluation performed in this study is strong evidence of the usefulness of a modern deep learning model in lead optimization pipelines against more expensive simulation-based alternatives.
由于在药物设计中具有重要性,针对蛋白质靶点对不同潜在药物分子的效力进行排序的能力一直是计算化学中的一项基本挑战。虽然存在几种基于模拟的方法,但它们难以前瞻性地使用,因此在先导优化活动中预测效力仍然是一个悬而未决的挑战。在此,我们提出了第一种专门为基于深度3D卷积神经网络对同系物系列进行排序而量身定制的机器学习方法。此外,我们通过在杨森、辉瑞和百健提供的数据集上进行盲测来证明其有效性,这些数据集总计超过3246种配体和13个靶点,以及几个著名的公开可用数据集,这代表了有史以来进行的最大规模评估之一。我们还以预测方式使用该方法进行先导优化的在线学习模拟,比实验选择获得了显著优势。我们相信,本研究中进行的评估有力地证明了现代深度学习模型在先导优化流程中相对于更昂贵的基于模拟的替代方法的有用性。