• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CarcinoPred-EL:使用分子指纹和集成学习方法预测化学品致癌性的新型模型。

CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods.

机构信息

School of Life Science, Liaoning University, Shenyang, 110036, China.

Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Liaoning Province, Shenyang, 110036, China.

出版信息

Sci Rep. 2017 May 18;7(1):2118. doi: 10.1038/s41598-017-02365-0.

DOI:10.1038/s41598-017-02365-0
PMID:28522849
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5437031/
Abstract

Carcinogenicity refers to a highly toxic end point of certain chemicals, and has become an important issue in the drug development process. In this study, three novel ensemble classification models, namely Ensemble SVM, Ensemble RF, and Ensemble XGBoost, were developed to predict carcinogenicity of chemicals using seven types of molecular fingerprints and three machine learning methods based on a dataset containing 1003 diverse compounds with rat carcinogenicity. Among these three models, Ensemble XGBoost is found to be the best, giving an average accuracy of 70.1 ± 2.9%, sensitivity of 67.0 ± 5.0%, and specificity of 73.1 ± 4.4% in five-fold cross-validation and an accuracy of 70.0%, sensitivity of 65.2%, and specificity of 76.5% in external validation. In comparison with some recent methods, the ensemble models outperform some machine learning-based approaches and yield equal accuracy and higher specificity but lower sensitivity than rule-based expert systems. It is also found that the ensemble models could be further improved if more data were available. As an application, the ensemble models are employed to discover potential carcinogens in the DrugBank database. The results indicate that the proposed models are helpful in predicting the carcinogenicity of chemicals. A web server called CarcinoPred-EL has been built for these models ( http://ccsipb.lnu.edu.cn/toxicity/CarcinoPred-EL/ ).

摘要

致癌性是指某些化学物质的高度毒性终点,已成为药物开发过程中的一个重要问题。在这项研究中,开发了三种新型集成分类模型,即集成 SVM、集成 RF 和集成 XGBoost,使用七种分子指纹和三种基于包含 1003 种具有大鼠致癌性的不同化合物的数据集的机器学习方法来预测化学物质的致癌性。在这三个模型中,发现集成 XGBoost 是最好的,在五重交叉验证中平均准确率为 70.1±2.9%,灵敏度为 67.0±5.0%,特异性为 73.1±4.4%,外部验证的准确率为 70.0%,灵敏度为 65.2%,特异性为 76.5%。与一些最近的方法相比,集成模型优于一些基于机器学习的方法,并且在准确性和特异性方面与基于规则的专家系统相当,但灵敏度较低。如果有更多的数据,还可以进一步改进集成模型。作为一种应用,将集成模型用于在 DrugBank 数据库中发现潜在的致癌物质。结果表明,所提出的模型有助于预测化学物质的致癌性。已经为这些模型建立了一个名为 CarcinoPred-EL 的网络服务器(http://ccsipb.lnu.edu.cn/toxicity/CarcinoPred-EL/)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a995/5437031/25734484e170/41598_2017_2365_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a995/5437031/e0b0f8c6f0fd/41598_2017_2365_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a995/5437031/7ed64dca4c44/41598_2017_2365_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a995/5437031/30d4fc1af5c5/41598_2017_2365_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a995/5437031/fa7b35b86e59/41598_2017_2365_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a995/5437031/25734484e170/41598_2017_2365_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a995/5437031/e0b0f8c6f0fd/41598_2017_2365_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a995/5437031/7ed64dca4c44/41598_2017_2365_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a995/5437031/30d4fc1af5c5/41598_2017_2365_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a995/5437031/fa7b35b86e59/41598_2017_2365_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a995/5437031/25734484e170/41598_2017_2365_Fig5_HTML.jpg

相似文献

1
CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods.CarcinoPred-EL:使用分子指纹和集成学习方法预测化学品致癌性的新型模型。
Sci Rep. 2017 May 18;7(1):2118. doi: 10.1038/s41598-017-02365-0.
2
Predicting Drug-Induced Liver Injury Using Ensemble Learning Methods and Molecular Fingerprints.基于集成学习方法和分子指纹预测药物性肝损伤。
Toxicol Sci. 2018 Sep 1;165(1):100-107. doi: 10.1093/toxsci/kfy121.
3
Application of a developed triple-classification machine learning model for carcinogenic prediction of hazardous organic chemicals to the US, EU, and WHO based on Chinese database.应用基于中国数据库开发的三分类机器学习模型对美国、欧盟和世界卫生组织的危险有机化学品进行致癌性预测。
Ecotoxicol Environ Saf. 2023 Apr 15;255:114806. doi: 10.1016/j.ecoenv.2023.114806. Epub 2023 Mar 20.
4
Predicting the cytotoxicity of chemicals using ensemble learning methods and molecular fingerprints.利用集成学习方法和分子指纹预测化学品的细胞毒性。
J Appl Toxicol. 2019 Oct;39(10):1366-1377. doi: 10.1002/jat.3785. Epub 2019 Feb 14.
5
QSAR modelling study of the bioconcentration factor and toxicity of organic compounds to aquatic organisms using machine learning and ensemble methods.基于机器学习和集成方法的有机化合物对水生生物的生物浓缩因子和毒性的定量构效关系建模研究。
Ecotoxicol Environ Saf. 2019 Sep 15;179:71-78. doi: 10.1016/j.ecoenv.2019.04.035. Epub 2019 Apr 23.
6
Predicting the reproductive toxicity of chemicals using ensemble learning methods and molecular fingerprints.利用集成学习方法和分子指纹预测化学品的生殖毒性。
Toxicol Lett. 2021 Apr 1;340:4-14. doi: 10.1016/j.toxlet.2021.01.002. Epub 2021 Jan 6.
7
In Silico Estimation of Chemical Carcinogenicity with Binary and Ternary Classification Methods.采用二元和三元分类方法对化学致癌性进行计算机模拟评估。
Mol Inform. 2015 Apr;34(4):228-35. doi: 10.1002/minf.201400127. Epub 2015 Mar 27.
8
First report on development of quantitative interspecies structure-carcinogenicity relationship models and exploring discriminatory features for rodent carcinogenicity of diverse organic chemicals using OECD guidelines.采用 OECD 指导原则报告开发定量种间结构-致癌性关系模型以及探索用于不同有机化学品的啮齿动物致癌性的鉴别特征的首次研究结果。
Chemosphere. 2012 Apr;87(4):339-55. doi: 10.1016/j.chemosphere.2011.12.019. Epub 2012 Jan 4.
9
Prediction of chemical carcinogenicity by machine learning approaches.通过机器学习方法预测化学致癌性。
SAR QSAR Environ Res. 2009;20(1-2):27-75. doi: 10.1080/10629360902724085.
10
Novel naïve Bayes classification models for predicting the carcinogenicity of chemicals.用于预测化学物质致癌性的新型朴素贝叶斯分类模型。
Food Chem Toxicol. 2016 Nov;97:141-149. doi: 10.1016/j.fct.2016.09.005. Epub 2016 Sep 3.

引用本文的文献

1
Molecular simulation-based investigation of thiazole derivatives as potential LasR inhibitors of Pseudomonas aeruginosa.基于分子模拟对噻唑衍生物作为铜绿假单胞菌潜在LasR抑制剂的研究。
PLoS One. 2025 Apr 22;20(4):e0320841. doi: 10.1371/journal.pone.0320841. eCollection 2025.
2
Structure-guided screening identified bioactive phytoconstituents Hernandonine and Anolobine as potential inhibitors of dual specificity protein kinase CLK1.基于结构的筛选确定了生物活性植物成分 Hernandonine 和 Anolobine 作为双特异性蛋白激酶 CLK1 的潜在抑制剂。
Sci Rep. 2025 Apr 19;15(1):13604. doi: 10.1038/s41598-025-97753-2.
3
Evaluating the ability of in silico identified hit compounds to bind Staphylococcus aureus LcpA using steered molecular dynamics simulations.

本文引用的文献

1
Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships.极端梯度提升在定量构效关系中的应用。
J Chem Inf Model. 2016 Dec 27;56(12):2353-2360. doi: 10.1021/acs.jcim.6b00591. Epub 2016 Dec 13.
2
A Predictive Model for Toxicity Effects Assessment of Biotransformed Hepatic Drugs Using Iterative Sampling Method.基于迭代抽样方法的生物转化肝毒性药物毒性效应评估的预测模型。
Sci Rep. 2016 Dec 9;6:38660. doi: 10.1038/srep38660.
3
Novel naïve Bayes classification models for predicting the carcinogenicity of chemicals.
使用引导分子动力学模拟评估计算机虚拟筛选出的命中化合物与金黄色葡萄球菌LcpA结合的能力。
Mol Divers. 2025 Mar 27. doi: 10.1007/s11030-025-11155-0.
4
Structure-guided identification of potential inhibitors of MurB from S. typhimurium LT2 strain: towards therapeutic development against multidrug resistance.基于结构的鼠伤寒沙门氏菌LT2菌株MurB潜在抑制剂的鉴定:迈向抗多药耐药性的治疗开发
Mol Divers. 2024 Dec 14. doi: 10.1007/s11030-024-11069-3.
5
AI and ML-based risk assessment of chemicals: predicting carcinogenic risk from chemical-induced genomic instability.基于人工智能和机器学习的化学品风险评估:从化学物质诱导的基因组不稳定性预测致癌风险。
Front Toxicol. 2024 Nov 26;6:1461587. doi: 10.3389/ftox.2024.1461587. eCollection 2024.
6
Discovering novel inhibitors of RfaH from Klebsiella pneumoniae to combat antimicrobial resistance.从肺炎克雷伯氏菌中发现新型 RfaH 抑制剂以对抗抗菌药物耐药性。
Arch Microbiol. 2024 Nov 20;206(12):472. doi: 10.1007/s00203-024-04192-0.
7
Structure-based drug-development study against fibroblast growth factor receptor 2: molecular docking and Molecular dynamics simulation approaches.基于结构的成纤维细胞生长因子受体 2 药物开发研究:分子对接和分子动力学模拟方法。
Sci Rep. 2024 Aug 21;14(1):19439. doi: 10.1038/s41598-024-69850-1.
8
Repurposing of Drug Bank Compounds against Dihydroorotate Dehydrogenase as novel anti malarial drug candidates by Computational approaches.通过计算方法将药物银行化合物重新用作新型抗疟药物候选物以对抗二氢乳清酸脱氢酶。
In Silico Pharmacol. 2024 Jul 6;12(2):60. doi: 10.1007/s40203-024-00232-1. eCollection 2024.
9
Rational Approach toward COVID-19's Main Protease Inhibitors: A Hierarchical Biochemoinformatics Analysis.理性看待 COVID-19 的主要蛋白酶抑制剂:层次化的生物化学信息学分析。
Int J Mol Sci. 2024 Jun 18;25(12):6715. doi: 10.3390/ijms25126715.
10
Biological evaluation of novel side chain containing CQTrICh-analogs as antimalarials and their development as CDPK1 kinase inhibitors.含新型侧链的CQTrICh类似物作为抗疟药的生物学评价及其作为CDPK1激酶抑制剂的研发
Heliyon. 2024 Jan 23;10(3):e25077. doi: 10.1016/j.heliyon.2024.e25077. eCollection 2024 Feb 15.
用于预测化学物质致癌性的新型朴素贝叶斯分类模型。
Food Chem Toxicol. 2016 Nov;97:141-149. doi: 10.1016/j.fct.2016.09.005. Epub 2016 Sep 3.
4
HGIMDA: Heterogeneous graph inference for miRNA-disease association prediction.HGIMDA:用于miRNA-疾病关联预测的异构图推理
Oncotarget. 2016 Oct 4;7(40):65257-65269. doi: 10.18632/oncotarget.11251.
5
IRWRLDA: improved random walk with restart for lncRNA-disease association prediction.IRWRLDA:用于lncRNA-疾病关联预测的带重启的改进随机游走算法
Oncotarget. 2016 Sep 6;7(36):57919-57931. doi: 10.18632/oncotarget.11141.
6
In Silico Estimation of Chemical Carcinogenicity with Binary and Ternary Classification Methods.采用二元和三元分类方法对化学致癌性进行计算机模拟评估。
Mol Inform. 2015 Apr;34(4):228-35. doi: 10.1002/minf.201400127. Epub 2015 Mar 27.
7
NLLSS: Predicting Synergistic Drug Combinations Based on Semi-supervised Learning.NLLSS:基于半监督学习预测协同药物组合
PLoS Comput Biol. 2016 Jul 14;12(7):e1004975. doi: 10.1371/journal.pcbi.1004975. eCollection 2016 Jul.
8
Long non-coding RNAs and complex diseases: from experimental results to computational models.长链非编码RNA与复杂疾病:从实验结果到计算模型
Brief Bioinform. 2017 Jul 1;18(4):558-576. doi: 10.1093/bib/bbw060.
9
iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier.iPhos-PseEn:通过将不同的伪组分融合到集成分类器中来识别蛋白质中的磷酸化位点。
Oncotarget. 2016 Aug 9;7(32):51270-51283. doi: 10.18632/oncotarget.9987.
10
In silico toxicology: computational methods for the prediction of chemical toxicity.计算机毒理学:预测化学物质毒性的计算方法。
Wiley Interdiscip Rev Comput Mol Sci. 2016 Mar;6(2):147-172. doi: 10.1002/wcms.1240. Epub 2016 Jan 6.