Jung Jinwoo, Moon Jeon-Ok, Ahn Song Ih, Lee Haeseung
Department of Pharmacy, College of Pharmacy and Research Institute for Drug Development, Pusan National University, Busan 46241, Korea.
School of Mechanical Engineering, Pusan National University, Busan 46241, Korea.
Korean J Physiol Pharmacol. 2024 Nov 1;28(6):527-537. doi: 10.4196/kjpp.2024.28.6.527.
Oxidative stress is a well-established risk factor for numerous chronic diseases, emphasizing the need for efficient identification of potent antioxidants. Conventional methods for assessing antioxidant properties are often time-consuming and resource-intensive, typically relying on laborious biochemical assays. In this study, we investigated the applicability of machine learning (ML) algorithms for predicting the antioxidant activity of compounds based solely on their molecular structure. We evaluated the performance of five ML algorithms, Support Vector Machine (SVM), Logistic Regression (LR), XGBoost, Random Forest (RF), and Deep Neural Network (DNN), using a dataset of over 1,900 compounds with experimentally determined antioxidant activity. Both RF and SVM achieved the best overall performance, exhibiting high accuracy (> 0.9) and effectively distinguishing active and inactive compounds with high structural similarity. External validation using natural product data from the BATMAN database confirmed the generalizability of the RF and SVM models. Our results suggest that ML models serve as powerful tools to expedite the discovery of novel antioxidant candidates, potentially streamlining the development of future therapeutic interventions.
氧化应激是众多慢性疾病公认的风险因素,这凸显了高效识别强效抗氧化剂的必要性。评估抗氧化特性的传统方法通常既耗时又耗费资源,通常依赖于繁琐的生化检测。在本研究中,我们调查了机器学习(ML)算法仅基于化合物分子结构预测其抗氧化活性的适用性。我们使用一个包含1900多种具有实验测定抗氧化活性的化合物数据集,评估了五种ML算法的性能,即支持向量机(SVM)、逻辑回归(LR)、XGBoost、随机森林(RF)和深度神经网络(DNN)。RF和SVM均取得了最佳的整体性能,表现出高准确率(>0.9),并能有效区分结构高度相似的活性和非活性化合物。使用来自BATMAN数据库的天然产物数据进行的外部验证证实了RF和SVM模型的通用性。我们的结果表明,ML模型是加速发现新型抗氧化剂候选物的有力工具,可能会简化未来治疗干预措施的开发。