Yu Xinxin, Wang Yimeng, Chen Long, Li Weihua, Tang Yun, Liu Guixia
Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
J Pharm Anal. 2025 Aug;15(8):101317. doi: 10.1016/j.jpha.2025.101317. Epub 2025 Apr 21.
Activity cliffs (ACs) are generally defined as pairs of similar compounds that only differ by a minor structural modification but exhibit a large difference in their binding affinity for a given target. ACs offer crucial insights that aid medicinal chemists in optimizing molecular structures. Nonetheless, they also form a major source of prediction error in structure-activity relationship (SAR) models. To date, several studies have demonstrated that deep neural networks based on molecular images or graphs might need to be improved further in predicting the potency of ACs. In this paper, we integrated the triplet loss in face recognition with pre-training strategy to develop a prediction model ACtriplet, tailored for ACs. Through extensive comparison with multiple baseline models on 30 benchmark datasets, the results showed that ACtriplet was significantly better than those deep learning (DL) models without pre-training. In addition, we explored the effect of pre-training on data representation. Finally, the case study demonstrated that our model's interpretability module could explain the prediction results reasonably. In the dilemma that the amount of data could not be increased rapidly, this innovative framework would better make use of the existing data, which would propel the potential of DL in the early stage of drug discovery and optimization.
活性断崖(ACs)通常被定义为一对相似的化合物,它们仅在微小的结构修饰上有所不同,但对给定靶点的结合亲和力却表现出很大差异。活性断崖提供了关键的见解,有助于药物化学家优化分子结构。尽管如此,它们也是构效关系(SAR)模型中预测误差的主要来源。迄今为止,多项研究表明,基于分子图像或图谱的深度神经网络在预测活性断崖的效力方面可能仍需进一步改进。在本文中,我们将人脸识别中的三元组损失与预训练策略相结合,开发了一个专门针对活性断崖的预测模型ACtriplet。通过在30个基准数据集上与多个基线模型进行广泛比较,结果表明ACtriplet明显优于那些没有预训练的深度学习(DL)模型。此外,我们还探讨了预训练对数据表示的影响。最后,案例研究表明我们模型的可解释性模块能够合理地解释预测结果。在数据量无法快速增加的困境下,这个创新框架将更好地利用现有数据,推动深度学习在药物发现和优化早期阶段的潜力。