Suppr超能文献

ACLPred:一种用于抗癌配体预测的可解释机器学习和基于树的集成模型。

ACLPred: an explainable machine learning and tree-based ensemble model for anticancer ligand prediction.

作者信息

Yadav Arvind Kumar, Kim Jun-Mo

机构信息

Functional Genomics & Bioinformatics Laboratory, Department of Animal Science and Technology, Chung-Ang University, Anseong, 17546, Gyeonggi-do, Republic of Korea.

出版信息

Sci Rep. 2025 Aug 25;15(1):31268. doi: 10.1038/s41598-025-16575-4.

Abstract

Several small molecules have been approved for cancer treatment, but the continuously growing cancer cases have further encouraged the identification of new anticancer drug compounds. Experimental methods are costly and time-consuming, thus rapid and cost-effective alternative method is much required. The effective identification of anticancer compounds using machine learning (ML) offers a promising solution, reducing both time and cost. In this study, small molecules with known inhibitory activities, both anticancer and non-anticancer were considered to train classification models. Molecular descriptors were calculated, and multistep feature selection was applied to identify significant features. Multiple ML algorithms were employed to build classification models and evaluated their performance using independent test and external datasets. The tree-based ensemble model, particularly Light Gradient Boosting Machine (LGBM), achieved the highest prediction accuracy of 90.33%, with an area under the receiver operating characteristic curve (AUROC) of 97.31%. Consequently, LGBM model was implemented in our proposed method, ACLPred. The ACLPred demonstrated superior prediction accuracy with good generalizability compared to existing methods. SHapley Additive exPlanations (SHAP) analysis provided model interpretability and revealed that topological features made major contributions to decision-making. ACLPred is available as an open-source, user-friendly graphical interface at https://github.com/ArvindYadav7/ACLPred for the screening of potential anticancer compounds.

摘要

几种小分子已被批准用于癌症治疗,但不断增加的癌症病例进一步促使人们寻找新的抗癌药物化合物。实验方法成本高且耗时,因此迫切需要快速且经济高效的替代方法。利用机器学习(ML)有效识别抗癌化合物提供了一个有前景的解决方案,可减少时间和成本。在本研究中,考虑具有已知抑制活性的小分子,包括抗癌和非抗癌小分子,来训练分类模型。计算分子描述符,并应用多步特征选择来识别显著特征。采用多种ML算法构建分类模型,并使用独立测试和外部数据集评估其性能。基于树的集成模型,特别是轻梯度提升机(LGBM),实现了90.33%的最高预测准确率,接收器操作特征曲线下面积(AUROC)为97.31%。因此,LGBM模型被应用于我们提出的方法ACLPred中。与现有方法相比,ACLPred表现出卓越的预测准确率和良好的泛化能力。SHapley加性解释(SHAP)分析提供了模型可解释性,并揭示拓扑特征对决策起主要作用。ACLPred作为一个开源的、用户友好的图形界面可在https://github.com/ArvindYadav7/ACLPred获取,用于筛选潜在的抗癌化合物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6c7/12378186/5777742cb133/41598_2025_16575_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验