School of Biotechnology and Health Sciences, Wuyi University, Dongcheng Village, Jiangmen 529020, China.
Molecules. 2023 Mar 2;28(5):2326. doi: 10.3390/molecules28052326.
In recent years, machine learning methods have been applied successfully in many fields. In this paper, three machine learning algorithms, including partial least squares-discriminant analysis (PLS-DA), adaptive boosting (AdaBoost), and light gradient boosting machine (LGBM), were applied to establish models for predicting the Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET for short) properties, namely Caco-2, CYP3A4, hERG, HOB, MN of anti-breast cancer compounds. To the best of our knowledge, the LGBM algorithm was applied to classify the ADMET property of anti-breast cancer compounds for the first time. We evaluated the established models in the prediction set using accuracy, precision, recall, and F1-score. Compared with the performance of the models established using the three algorithms, the LGBM yielded most satisfactory results (accuracy > 0.87, precision > 0.72, recall > 0.73, and F1-score > 0.73). According to the obtained results, it can be inferred that LGBM can establish reliable models to predict the molecular ADMET properties and provide a useful tool for virtual screening and drug design researchers.
近年来,机器学习方法已成功应用于许多领域。本文采用三种机器学习算法,包括偏最小二乘判别分析(PLS-DA)、自适应增强(AdaBoost)和轻梯度提升机(LGBM),建立了用于预测抗乳腺癌化合物的吸收、分布、代谢、排泄和毒性(简称 ADMET)性质的模型,即 Caco-2、CYP3A4、hERG、HOB、MN。据我们所知,LGBM 算法首次被应用于抗乳腺癌化合物的 ADMET 性质分类。我们使用准确度、精确度、召回率和 F1 分数在预测集中评估了所建立的模型。与使用这三种算法建立的模型的性能相比,LGBM 产生了最令人满意的结果(准确度>0.87、精确度>0.72、召回率>0.73 和 F1 分数>0.73)。根据所得结果,可以推断 LGBM 可以建立可靠的模型来预测分子 ADMET 性质,并为虚拟筛选和药物设计研究人员提供有用的工具。