Suppr超能文献

应用基于中国数据库开发的三分类机器学习模型对美国、欧盟和世界卫生组织的危险有机化学品进行致癌性预测。

Application of a developed triple-classification machine learning model for carcinogenic prediction of hazardous organic chemicals to the US, EU, and WHO based on Chinese database.

机构信息

College of New Energy and Environment, Jilin University, Changchun 130012, China.

College of New Energy and Environment, Jilin University, Changchun 130012, China.

出版信息

Ecotoxicol Environ Saf. 2023 Apr 15;255:114806. doi: 10.1016/j.ecoenv.2023.114806. Epub 2023 Mar 20.

Abstract

Cancer, the second largest human disease, has become a major public health problem. The prediction of chemicals' carcinogenicity before their synthesis is crucial. In this paper, seven machine learning algorithms (i.e., Random Forest (RF), Logistic Regression (LR), Support Vector Machines (SVM), Complement Naive Bayes (CNB), K-Nearest Neighbor (KNN), XGBoost, and Multilayer Perceptron (MLP)) were used to construct the carcinogenicity triple classification prediction (TCP) model (i.e., 1A, 1B, Category 2). A total of 1444 descriptors of 118 hazardous organic chemicals were calculated by Discovery Studio 2020, Sybyl X-2.0 and PaDEL-Descriptor software. The constructed carcinogenicity TCP model was evaluated through five model evaluation indicators (i.e., Accuracy, Precision, Recall, F1 Score and AUC). The model evaluation results show that Accuracy, Precision, Recall, F1 Score and AUC evaluation indicators meet requirements (greater than 0.6). The accuracy of RF, LR, XGBoost, and MLP models for predicting carcinogenicity of Category 2 is 91.67%, 79.17%, 100%, and 100%, respectively. In addition, the constructed machine learning model in this study has potential for error correction. Taking XGBoost model as an example, the predicted carcinogenicity level of 1,2,3-Trichloropropane (96-18-4) is Category 2, but the actual carcinogenicity level is 1B. But the difference between Category 2 and 1B is only 0.004, indicating that the XGBoost is one optimum model of the seven constructed machine learning models. Besides, results showed that functional groups like chlorine and benzene ring might influence the prediction of carcinogenic classification. Therefore, considering functional group characteristics of chemicals before constructing the carcinogenicity prediction model of organic chemicals is recommended. The predicted carcinogenicity of the organic chemicals using the optimum machine leaning model (i.e., XGBoost) was also evaluated and verified by the toxicokinetics. The RF and XGBoost TCP models constructed in this paper can be used for carcinogenicity detection before synthesizing new organic substances. It also provides technical support for the subsequent management of organic chemicals.

摘要

癌症是人类的第二大疾病,已成为重大的公共卫生问题。在合成化学品之前对其致癌性进行预测至关重要。在本文中,我们使用了七种机器学习算法(即随机森林(RF)、逻辑回归(LR)、支持向量机(SVM)、补充朴素贝叶斯(CNB)、K-最近邻(KNN)、XGBoost 和多层感知机(MLP))来构建致癌性三重分类预测(TCP)模型(即 1A、1B、2 类)。使用 Discovery Studio 2020、Sybyl X-2.0 和 PaDEL-Descriptor 软件计算了 118 种危险有机化学品的 1444 个描述符。通过五个模型评估指标(即准确性、精确性、召回率、F1 得分和 AUC)对构建的致癌性 TCP 模型进行了评估。模型评估结果表明,准确性、精确性、召回率、F1 得分和 AUC 评估指标均满足要求(大于 0.6)。RF、LR、XGBoost 和 MLP 模型对 2 类致癌性的预测准确率分别为 91.67%、79.17%、100%和 100%。此外,该研究构建的机器学习模型具有纠错潜力。以 XGBoost 模型为例,1,2,3-三氯丙烷(96-18-4)的预测致癌性水平为 2 类,但实际致癌性水平为 1B。但 2 类和 1B 之间的差异仅为 0.004,这表明 XGBoost 是七个构建的机器学习模型中最优的模型之一。此外,结果表明,氯和苯环等官能团可能会影响致癌分类的预测。因此,建议在构建有机化学品致癌预测模型之前,考虑化学品的官能团特征。还通过毒代动力学对使用最优机器学习模型(即 XGBoost)预测的有机化学品的致癌性进行了评估和验证。本文构建的 RF 和 XGBoost TCP 模型可用于合成新有机物质之前的致癌性检测。它也为有机化学品的后续管理提供了技术支持。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验