Khan Muhammad Zafar Irshad, Ren Jia-Nan, Cao Cheng, Ye Hong-Yu-Xiang, Wang Hao, Guo Ya-Min, Yang Jin-Rong, Chen Jian-Zhong
College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
Polytechnic Institute, Zhejiang University, Hangzhou, China.
Front Pharmacol. 2024 Aug 21;15:1441587. doi: 10.3389/fphar.2024.1441587. eCollection 2024.
Chemicals may lead to acute liver injuries, posing a serious threat to human health. Achieving the precise safety profile of a compound is challenging due to the complex and expensive testing procedures. In silico approaches will aid in identifying the potential risk of drug candidates in the initial stage of drug development and thus mitigating the developmental cost.
In current studies, QSAR models were developed for hepatotoxicity predictions using the ensemble strategy to integrate machine learning (ML) and deep learning (DL) algorithms using various molecular features. A large dataset of 2588 chemicals and drugs was randomly divided into training (80%) and test (20%) sets, followed by the training of individual base models using diverse machine learning or deep learning based on three different kinds of descriptors and fingerprints. Feature selection approaches were employed to proceed with model optimizations based on the model performance. Hybrid ensemble approaches were further utilized to determine the method with the best performance.
The voting ensemble classifier emerged as the optimal model, achieving an excellent prediction accuracy of 80.26%, AUC of 82.84%, and recall of over 93% followed by bagging and stacking ensemble classifiers method. The model was further verified by an external test set, internal 10-fold cross-validation, and rigorous benchmark training, exhibiting much better reliability than the published models.
The proposed ensemble model offers a dependable assessment with a good performance for the prediction regarding the risk of chemicals and drugs to induce liver damage.
化学物质可能导致急性肝损伤,对人类健康构成严重威胁。由于测试程序复杂且昂贵,要获得化合物准确的安全性概况具有挑战性。计算机模拟方法将有助于在药物开发的初始阶段识别候选药物的潜在风险,从而降低开发成本。
在当前研究中,使用集成策略开发了用于肝毒性预测的QSAR模型,该策略使用各种分子特征来整合机器学习(ML)和深度学习(DL)算法。将包含2588种化学物质和药物的大型数据集随机分为训练集(80%)和测试集(20%),然后使用基于三种不同描述符和指纹的各种机器学习或深度学习方法训练单个基础模型。基于模型性能采用特征选择方法进行模型优化。进一步利用混合集成方法来确定性能最佳的方法。
投票集成分类器成为最优模型,预测准确率达到80.26%,AUC为82.84%,召回率超过93%,其次是装袋和堆叠集成分类器方法。该模型通过外部测试集、内部10折交叉验证和严格的基准训练进一步验证,表现出比已发表模型更高的可靠性。
所提出的集成模型为预测化学物质和药物诱导肝损伤的风险提供了可靠的评估,性能良好。