• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

迈向可解释的致癌性预测:一种用于潜在致癌化学物质的综合化学信息学方法和共识框架

Toward Explainable Carcinogenicity Prediction: An Integrated Cheminformatics Approach and Consensus Framework for Possibly Carcinogenic Chemicals.

作者信息

Duy Huynh Anh, Srisongkram Tarapong

机构信息

Graduate School in the Program of Research and Development in Pharmaceuticals, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40002, Thailand.

Department of Health Sciences, College of Natural Sciences, Can Tho University, Can Tho 900000, Vietnam.

出版信息

J Chem Inf Model. 2025 Oct 13;65(19):10194-10220. doi: 10.1021/acs.jcim.5c01873. Epub 2025 Sep 12.

DOI:10.1021/acs.jcim.5c01873
PMID:40938794
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12529778/
Abstract

A carcinogenicity assessment of possibly carcinogenic chemicals (International Agency for Research on Cancer: IARC class 2B) was conducted using a consensus framework constructed from three complementary machine learning models: BiLSTM with MACCS fingerprints, LightGBM with RDKit descriptors, and Random Forest (RF) with E-state features. These models were developed and rigorously evaluated on benchmark carcinogenicity data sets, with LightGBM emerging as the top performer (accuracy = 0.800, MCC = 0.615, AUROC = 0.882, sensitivity = 0.739, specificity = 0.857). Consistent and competitive performance was also observed for RF and BiLSTM, affirming the reliability of individual predictions. Notably, LightGBM maintained strong generalization ability on independent human carcinogen test sets from IARC and IRIS (accuracy = 0.753, MCC = 0.535, AUROC = 0.842). For the ISSCAN internal test set, the top three models achieved MCC values ranging from 0.564 to 0.615, with AUROC scores between 0.858 and 0.882. For the human carcinogen test set, the top three models attained MCC values from 0.335 to 0.535 and AUROC scores ranging from 0.785 to 0.842. The consensus model was subsequently applied to 47 within-domain compounds from the 2B category, classifying them into 16 potential carcinogens, 8 presumed noncarcinogens, and 23 cases with inconclusive results. To uncover structural correlates, a SHAP-based interpretation of the BiLSTM model was performed, revealing discriminative molecular features including MACCS fingerprint keys and core Bemis-Murcko scaffolds associated with predicted carcinogenicity. To support practical applications, a freely accessible web server for carcinogenicity assessment has been developed and is available at https://carcinogenicity-predictor.streamlit.app.

摘要

使用由三个互补的机器学习模型构建的共识框架,对可能致癌的化学物质(国际癌症研究机构:IARC 2B类)进行了致癌性评估:带有MACCS指纹的双向长短期记忆网络(BiLSTM)、带有RDKit描述符的LightGBM以及带有E态特征的随机森林(RF)。这些模型是在基准致癌性数据集上开发并经过严格评估的,其中LightGBM表现最佳(准确率 = 0.800,马修斯相关系数 = 0.615,曲线下面积 = 0.882,灵敏度 = 0.739,特异性 = 0.857)。RF和BiLSTM也表现出一致且具有竞争力的性能,证实了单个预测的可靠性。值得注意的是,LightGBM在来自IARC和IRIS的独立人类致癌物测试集上保持了强大的泛化能力(准确率 = 0.753,马修斯相关系数 = 0.535,曲线下面积 = 0.842)。对于ISSCAN内部测试集,排名前三的模型的马修斯相关系数值在0.564至0.615之间,曲线下面积得分在0.858至0.882之间。对于人类致癌物测试集,排名前三的模型的马修斯相关系数值从0.335至0.535,曲线下面积得分在0.785至0.842之间。随后,将共识模型应用于2B类别的47种域内化合物,将它们分为16种潜在致癌物、8种假定非致癌物和23种结果不确定的情况。为了揭示结构相关性,对BiLSTM模型进行了基于SHAP的解释,揭示了与预测致癌性相关的判别性分子特征,包括MACCS指纹键和核心Bemis-Murcko支架。为了支持实际应用,已开发了一个可免费访问的致癌性评估网络服务器,可在https://carcinogenicity-predictor.streamlit.app上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/d6551e3ecdc8/ci5c01873_0020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/ddbf98750d3c/ci5c01873_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/3afa76df6f3f/ci5c01873_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/ddf95a05f7c1/ci5c01873_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/ee0bd322723a/ci5c01873_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/44df6c14583b/ci5c01873_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/374e79e8f839/ci5c01873_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/27051871577e/ci5c01873_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/3bd7c515db14/ci5c01873_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/7359973cab9d/ci5c01873_0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/6eac169af956/ci5c01873_0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/0fb7a7f830ce/ci5c01873_0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/70644375b072/ci5c01873_0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/3cfb8ccda24f/ci5c01873_0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/999905e47bd6/ci5c01873_0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/b8dadd39bc5d/ci5c01873_0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/c75dc60d2e5d/ci5c01873_0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/24c2f4393892/ci5c01873_0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/922833dd4cec/ci5c01873_0018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/0aa418cc2257/ci5c01873_0019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/d6551e3ecdc8/ci5c01873_0020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/ddbf98750d3c/ci5c01873_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/3afa76df6f3f/ci5c01873_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/ddf95a05f7c1/ci5c01873_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/ee0bd322723a/ci5c01873_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/44df6c14583b/ci5c01873_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/374e79e8f839/ci5c01873_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/27051871577e/ci5c01873_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/3bd7c515db14/ci5c01873_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/7359973cab9d/ci5c01873_0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/6eac169af956/ci5c01873_0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/0fb7a7f830ce/ci5c01873_0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/70644375b072/ci5c01873_0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/3cfb8ccda24f/ci5c01873_0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/999905e47bd6/ci5c01873_0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/b8dadd39bc5d/ci5c01873_0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/c75dc60d2e5d/ci5c01873_0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/24c2f4393892/ci5c01873_0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/922833dd4cec/ci5c01873_0018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/0aa418cc2257/ci5c01873_0019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49fd/12529778/d6551e3ecdc8/ci5c01873_0020.jpg

相似文献

1
Toward Explainable Carcinogenicity Prediction: An Integrated Cheminformatics Approach and Consensus Framework for Possibly Carcinogenic Chemicals.迈向可解释的致癌性预测:一种用于潜在致癌化学物质的综合化学信息学方法和共识框架
J Chem Inf Model. 2025 Oct 13;65(19):10194-10220. doi: 10.1021/acs.jcim.5c01873. Epub 2025 Sep 12.
2
Toxicophore-informed machine learning integrating Tox21 assay readouts for organ system-specific carcinogenicity prediction.基于毒性基团信息的机器学习整合Tox21检测读数用于器官系统特异性致癌性预测。
Environ Pollut. 2026 Feb 1;390:127519. doi: 10.1016/j.envpol.2025.127519. Epub 2025 Dec 11.
3
Application of a developed triple-classification machine learning model for carcinogenic prediction of hazardous organic chemicals to the US, EU, and WHO based on Chinese database.应用基于中国数据库开发的三分类机器学习模型对美国、欧盟和世界卫生组织的危险有机化学品进行致癌性预测。
Ecotoxicol Environ Saf. 2023 Apr 15;255:114806. doi: 10.1016/j.ecoenv.2023.114806. Epub 2023 Mar 20.
4
In Silico Estimation of Chemical Carcinogenicity with Binary and Ternary Classification Methods.采用二元和三元分类方法对化学致癌性进行计算机模拟评估。
Mol Inform. 2015 Apr;34(4):228-35. doi: 10.1002/minf.201400127. Epub 2015 Mar 27.
5
HDAC3_VS_assistant: cheminformatics-driven discovery of histone deacetylase 3 inhibitors.HDAC3与助手:基于化学信息学的组蛋白去乙酰化酶3抑制剂发现
Mol Divers. 2024 Dec 23. doi: 10.1007/s11030-024-11066-6.
6
Combining machine learning models of in vitro and in vivo bioassays improves rat carcinogenicity prediction.将体外和体内生物测定的机器学习模型相结合可提高大鼠致癌性预测。
Regul Toxicol Pharmacol. 2018 Apr;94:8-15. doi: 10.1016/j.yrtph.2018.01.008. Epub 2018 Jan 11.
7
The comet assay with multiple mouse organs: comparison of comet assay results and carcinogenicity with 208 chemicals selected from the IARC monographs and U.S. NTP Carcinogenicity Database.对多种小鼠器官进行彗星试验:将彗星试验结果与从国际癌症研究机构专论和美国国家毒理学计划致癌性数据库中选取的208种化学物质的致癌性进行比较。
Crit Rev Toxicol. 2000 Nov;30(6):629-799. doi: 10.1080/10408440008951123.
8
New clues on carcinogenicity-related substructures derived from mining two large datasets of chemical compounds.从两个大型化合物数据集中挖掘得出的与致癌性相关子结构的新线索。
J Environ Sci Health C Environ Carcinog Ecotoxicol Rev. 2016 Apr 2;34(2):97-113. doi: 10.1080/10590501.2016.1166879.
9
CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods.CarcinoPred-EL:使用分子指纹和集成学习方法预测化学品致癌性的新型模型。
Sci Rep. 2017 May 18;7(1):2118. doi: 10.1038/s41598-017-02365-0.
10
Prediction of carcinogenicity for diverse chemicals based on substructure grouping and SVM modeling.基于亚结构分组和 SVM 建模预测多种化学物质的致癌性。
Mol Divers. 2010 Nov;14(4):789-802. doi: 10.1007/s11030-010-9232-y. Epub 2010 Feb 26.

本文引用的文献

1
Stacking Ensemble Neural Network for Chemical Safety Assessment: A Case Study of Thyroid Peroxidase and Natural Product Screening.用于化学安全评估的堆叠集成神经网络:以甲状腺过氧化物酶和天然产物筛选为例
ACS Omega. 2025 Jul 10;10(28):30450-30466. doi: 10.1021/acsomega.5c02188. eCollection 2025 Jul 22.
2
A hybrid framework of generative deep learning for antiviral peptide discovery.用于抗病毒肽发现的生成式深度学习混合框架。
Sci Rep. 2025 Jul 15;15(1):25554. doi: 10.1038/s41598-025-11328-9.
3
Machine Learning-Driven Consensus Modeling for Activity Ranking and Chemical Landscape Analysis of HIV-1 Inhibitors.
机器学习驱动的HIV-1抑制剂活性排名及化学景观分析的共识建模
Pharmaceuticals (Basel). 2025 May 13;18(5):714. doi: 10.3390/ph18050714.
4
Protecting your skin: a highly accurate LSTM network integrating conjoint features for predicting chemical-induced skin irritation.保护你的皮肤:一种集成联合特征的高精度长短期记忆网络,用于预测化学物质引起的皮肤刺激。
J Cheminform. 2025 Mar 27;17(1):39. doi: 10.1186/s13321-025-00980-y.
5
Bidirectional Long Short-Term Memory (BiLSTM) Neural Networks with Conjoint Fingerprints: Application in Predicting Skin-Sensitizing Agents in Natural Compounds.结合指纹的双向长短期记忆(BiLSTM)神经网络:在预测天然化合物中的皮肤致敏剂方面的应用。
J Chem Inf Model. 2025 Mar 24;65(6):3035-3047. doi: 10.1021/acs.jcim.5c00032. Epub 2025 Mar 3.
6
Quantum Chemical Evaluation and QSAR Modeling of -Nitrosamine Carcinogenicity.亚硝胺致癌性的量子化学评估与定量构效关系建模
Chem Res Toxicol. 2025 Feb 17;38(2):325-339. doi: 10.1021/acs.chemrestox.4c00476. Epub 2025 Feb 6.
7
Comparative Analysis of Recurrent Neural Networks with Conjoint Fingerprints for Skin Corrosion Prediction.用于皮肤腐蚀预测的结合指纹的循环神经网络比较分析
J Chem Inf Model. 2025 Feb 10;65(3):1305-1317. doi: 10.1021/acs.jcim.4c02062. Epub 2025 Jan 21.
8
Area under the ROC Curve has the most consistent evaluation for binary classification.受试者工作特征曲线下面积对二元分类具有最一致的评估。
PLoS One. 2024 Dec 23;19(12):e0316019. doi: 10.1371/journal.pone.0316019. eCollection 2024.
9
ToxSTK: A multi-target toxicity assessment utilizing molecular structure and stacking ensemble learning.ToxSTK:一种利用分子结构和堆叠集成学习的多靶点毒性评估方法。
Comput Biol Med. 2025 Feb;185:109480. doi: 10.1016/j.compbiomed.2024.109480. Epub 2024 Dec 6.
10
The digital evolution in toxicology: pioneering computational education for emerging challenges.毒理学的数字化演进:为新兴挑战开创计算教育。
BMC Med Educ. 2024 Oct 24;24(1):1204. doi: 10.1186/s12909-024-06163-x.