Department of Mathematics, Michigan State University, MI 48824, USA.
Pfizer Medicine Design, 610 Main St, Cambridge, MA 02139, USA.
Phys Chem Chem Phys. 2020 Apr 29;22(16):8373-8390. doi: 10.1039/d0cp00305k.
Recently, molecular fingerprints extracted from three-dimensional (3D) structures using advanced mathematics, such as algebraic topology, differential geometry, and graph theory have been paired with efficient machine learning, especially deep learning algorithms to outperform other methods in drug discovery applications and competitions. This raises the question of whether classical 2D fingerprints are still valuable in computer-aided drug discovery. This work considers 23 datasets associated with four typical problems, namely protein-ligand binding, toxicity, solubility and partition coefficient to assess the performance of eight 2D fingerprints. Advanced machine learning algorithms including random forest, gradient boosted decision tree, single-task deep neural network and multitask deep neural network are employed to construct efficient 2D-fingerprint based models. Additionally, appropriate consensus models are built to further enhance the performance of 2D-fingerprint-based methods. It is demonstrated that 2D-fingerprint-based models perform as well as the state-of-the-art 3D structure-based models for the predictions of toxicity, solubility, partition coefficient and protein-ligand binding affinity based on only ligand information. However, 3D structure-based models outperform 2D fingerprint-based methods in complex-based protein-ligand binding affinity predictions.
最近,使用先进数学(如代数拓扑、微分几何和图论)从三维(3D)结构中提取的分子指纹与高效机器学习(尤其是深度学习算法)相结合,在药物发现应用和竞赛中表现优于其他方法。这就提出了一个问题,即在计算机辅助药物发现中,经典的 2D 指纹是否仍然有价值。这项工作考虑了四个典型问题(即蛋白质-配体结合、毒性、溶解度和分配系数)相关的 23 个数据集,以评估八种 2D 指纹的性能。采用了先进的机器学习算法,包括随机森林、梯度提升决策树、单任务深度神经网络和多任务深度神经网络,来构建高效的基于 2D 指纹的模型。此外,还构建了适当的共识模型,以进一步提高基于 2D 指纹的方法的性能。结果表明,基于 2D 指纹的模型仅基于配体信息,在毒性、溶解度、分配系数和蛋白质-配体结合亲和力的预测方面,与最先进的基于 3D 结构的模型表现相当。然而,在基于复合物的蛋白质-配体结合亲和力预测方面,基于 3D 结构的模型优于基于 2D 指纹的方法。