Zhang Keqiong, Fan Zhiran, Wu Qilong, Liu Jianfeng, Huang Sheng-You
School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China.
Key Laboratory of Molecular Biophysics of MOE, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China.
J Chem Inf Model. 2025 Jul 14;65(13):7174-7192. doi: 10.1021/acs.jcim.5c00427. Epub 2025 Jun 17.
Accurate prediction of drug-protein interactions is crucial for drug discovery. Due to the bottleneck of traditional scoring functions, many machine learning scoring functions (MLSFs) have been proposed for structure-based drug screening. However, existing MLSFs face two challenges: small data limitations and poor interpretability. To address these challenges, we have proposed a physics-based small data machine learning framework for interpretable and generalizable prediction of drug-protein interactions on the target with scarce positive data through a strategy of three training phases with three (score, weight, and ranking) loss functions, named DrugBaiter. DrugBaiter has been extensively evaluated on the 102 targets of DUD-E and 81 targets of DEKOIS 2.0 for drug screening, and compared with 14 other MLSFs. It is shown that our DrugBaiter model can significantly improve the drug screening performance even if few actives are known for a target. In addition, DrugBaiter is interpretable in describing the interactions at the atomic level. The power of DrugBaiter is also confirmed by a drug screening application on the SARS-Cov-2 main protease target. It is anticipated that DrugBaiter will serve as a general machine learning scoring model for screening novel drugs on new targets with scarce known actives. DrugBaiter is freely available at http://huanglab.phys.hust.edu.cn/DrugBaiter.
准确预测药物与蛋白质的相互作用对于药物研发至关重要。由于传统评分函数存在瓶颈,人们提出了许多基于结构的药物筛选机器学习评分函数(MLSF)。然而,现有的MLSF面临两个挑战:小数据限制和可解释性差。为了应对这些挑战,我们提出了一种基于物理的小数据机器学习框架,通过三个训练阶段和三个(分数、权重和排名)损失函数的策略,对缺乏阳性数据的目标上的药物-蛋白质相互作用进行可解释和可推广的预测,名为DrugBaiter。DrugBaiter已在DUD-E的102个靶点和DEKOIS 2.0的81个靶点上进行了广泛的药物筛选评估,并与其他14个MLSF进行了比较。结果表明,即使对于一个靶点已知的活性物质很少,我们的DrugBaiter模型也能显著提高药物筛选性能。此外,DrugBaiter在描述原子水平的相互作用方面是可解释的。DrugBaiter在SARS-CoV-2主要蛋白酶靶点上的药物筛选应用也证实了其有效性。预计DrugBaiter将作为一种通用的机器学习评分模型,用于在已知活性物质稀缺的新靶点上筛选新型药物。DrugBaiter可在http://huanglab.phys.hust.edu.cn/DrugBaiter免费获取。