基于机器学习的预测模型，通过XGBoost利用物理化学性质评估液体化学品的皮肤刺激和腐蚀潜力。

Machine-learning based prediction models for assessing skin irritation and corrosion potential of liquid chemicals using physicochemical properties by XGBoost.

作者信息

Kang Yeonsoo, Kim Myeong Gyu, Lim Kyung-Min

机构信息

College of Pharmacy, Ewha Womans University, Seoul, 03760 Republic of Korea.

出版信息

Toxicol Res. 2023 Jan 23;39(2):295-305. doi: 10.1007/s43188-022-00168-8. eCollection 2023 Apr.

DOI:10.1007/s43188-022-00168-8

PMID:37008690

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10050629/

Abstract

UNLABELLED

Skin irritation test is an essential part of the safety assessment of chemicals. Recently, computational models to predict the skin irritation draw attention as alternatives to animal testing. We developed prediction models on skin irritation/corrosion of liquid chemicals using machine learning algorithms, with 34 physicochemical descriptors calculated from the structure. The training and test dataset of 545 liquid chemicals with reliable in vivo skin hazard classifications based on UN Globally Harmonized System [category 1 (corrosive, Cat 1), 2 (irritant, Cat 2), 3 (mild irritant, Cat 3), and no category (nonirritant, NC)] were collected from public databases. After the curation of input data through removal and correlation analysis, every model was constructed to predict skin hazard classification for liquid chemicals with 22 physicochemical descriptors. Seven machine learning algorithms [Logistic regression, Naïve Bayes, k-nearest neighbor, Support vector machine, Random Forest, Extreme gradient boosting (XGB), and Neural net] were applied to ternary and binary classification of skin hazard. XGB model demonstrated the highest accuracy (0.73-0.81), sensitivity (0.71-0.92), and positive predictive value (0.65-0.81). The contribution of physicochemical descriptors to the classification was analyzed using Shapley Additive exPlanations plot to provide an insight into the skin irritation of chemicals.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1007/s43188-022-00168-8.

摘要

未标注

皮肤刺激性试验是化学品安全性评估的重要组成部分。最近，预测皮肤刺激性的计算模型作为动物试验的替代方法受到关注。我们使用机器学习算法开发了关于液体化学品皮肤刺激/腐蚀的预测模型，从结构计算出34个物理化学描述符。从公共数据库收集了545种基于联合国全球协调系统具有可靠体内皮肤危害分类的液体化学品的训练和测试数据集[类别1（腐蚀性，Cat 1）、2（刺激性，Cat 2）、3（轻度刺激性，Cat 3）和无类别（无刺激性，NC）]。通过去除和相关性分析对输入数据进行整理后，构建了每个模型，以使用22个物理化学描述符预测液体化学品的皮肤危害分类。将七种机器学习算法[逻辑回归、朴素贝叶斯、k近邻、支持向量机、随机森林、极端梯度提升（XGB）和神经网络]应用于皮肤危害的三元和二元分类。XGB模型表现出最高的准确率（0.73 - 0.81）、灵敏度（0.71 - 0.92）和阳性预测值（0.65 - 0.81）。使用Shapley加性解释图分析了物理化学描述符对分类的贡献，以深入了解化学品的皮肤刺激性。

补充信息

在线版本包含可在10.1007/s43188-022-00168-8获取的补充材料。

相似文献

Machine-learning based prediction models for assessing skin irritation and corrosion potential of liquid chemicals using physicochemical properties by XGBoost.基于机器学习的预测模型，通过XGBoost利用物理化学性质评估液体化学品的皮肤刺激和腐蚀潜力。

Toxicol Res. 2023 Jan 23;39(2):295-305. doi: 10.1007/s43188-022-00168-8. eCollection 2023 Apr.

prediction of the full United Nations Globally Harmonized System eye irritation categories of liquid chemicals by IATA-like bottom-up approach of random forest method.通过 IATA 类似的随机森林方法自下而上方法预测液体化学品的完整联合国全球协调系统眼部刺激类别。

J Toxicol Environ Health A. 2021 Dec 2;84(23):960-972. doi: 10.1080/15287394.2021.1956661. Epub 2021 Jul 30.

Application of a developed triple-classification machine learning model for carcinogenic prediction of hazardous organic chemicals to the US, EU, and WHO based on Chinese database.应用基于中国数据库开发的三分类机器学习模型对美国、欧盟和世界卫生组织的危险有机化学品进行致癌性预测。

Ecotoxicol Environ Saf. 2023 Apr 15;255:114806. doi: 10.1016/j.ecoenv.2023.114806. Epub 2023 Mar 20.

Novel computational models offer alternatives to animal testing for assessing eye irritation and corrosion potential of chemicals.新型计算模型为评估化学品的眼刺激性和腐蚀性潜能提供了替代动物试验的方法。

Artif Intell Life Sci. 2021 Dec;1. doi: 10.1016/j.ailsci.2021.100028. Epub 2021 Dec 5.

Prediction of Acute Kidney Injury after Extracorporeal Cardiac Surgery (CSA-AKI) by Machine Learning Algorithms.机器学习算法预测体外循环心脏手术后急性肾损伤（CSA-AKI）。

Heart Surg Forum. 2023 Oct 25;26(5):E537-E551. doi: 10.59958/hsf.5673.

Machine learning-based prediction of cerebral hemorrhage in patients with hemodialysis: A multicenter, retrospective study.基于机器学习的血液透析患者脑出血预测：一项多中心回顾性研究。

Front Neurol. 2023 Apr 3;14:1139096. doi: 10.3389/fneur.2023.1139096. eCollection 2023.

Application of machine learning model in predicting the likelihood of blood transfusion after hip fracture surgery.机器学习模型在预测髋部骨折手术后输血可能性中的应用。

Aging Clin Exp Res. 2023 Nov;35(11):2643-2656. doi: 10.1007/s40520-023-02550-4. Epub 2023 Sep 21.

AttentiveSkin: To Predict Skin Corrosion/Irritation Potentials of Chemicals via Explainable Machine Learning Methods.AttentiveSkin：通过可解释机器学习方法预测化学品的皮肤腐蚀性/刺激性潜力。

Chem Res Toxicol. 2024 Feb 19;37(2):361-373. doi: 10.1021/acs.chemrestox.3c00332. Epub 2024 Jan 31.

Machine Learning Models for Prediction of Severe Pneumonia after Kidney Transplantation: A Single-Center Retrospective Study.用于预测肾移植后重症肺炎的机器学习模型：一项单中心回顾性研究

Diagnostics (Basel). 2023 Aug 23;13(17):2735. doi: 10.3390/diagnostics13172735.

Quantitative structure-Activity relationships for skin irritation and corrosivity of neutral and electrophilic organic chemicals.中性及亲电有机化学品皮肤刺激性和腐蚀性的定量构效关系

Toxicol In Vitro. 1996 Jun;10(3):247-56. doi: 10.1016/0887-2333(96)00007-0.

引用本文的文献

SbD4Skin by EosCloud: Integrating multi-view molecular representation for predicting skin sensitization, irritation, and acute dermal toxicity.EosCloud公司的SbD4Skin：整合多视图分子表示法以预测皮肤致敏、刺激和急性皮肤毒性。

Comput Struct Biotechnol J. 2025 Aug 6;29:222-235. doi: 10.1016/j.csbj.2025.08.001. eCollection 2025.

Requirements for Alternative In Vitro and In Silico Skin Models of Irritant and Allergic Contact Dermatitis.刺激性和过敏性接触性皮炎的替代体外和计算机皮肤模型的要求。

Contact Dermatitis. 2025 Sep;93(3):187-203. doi: 10.1111/cod.14815. Epub 2025 May 30.

Protecting your skin: a highly accurate LSTM network integrating conjoint features for predicting chemical-induced skin irritation.保护你的皮肤：一种集成联合特征的高精度长短期记忆网络，用于预测化学物质引起的皮肤刺激。

J Cheminform. 2025 Mar 27;17(1):39. doi: 10.1186/s13321-025-00980-y.

Comparative Analysis of Recurrent Neural Networks with Conjoint Fingerprints for Skin Corrosion Prediction.用于皮肤腐蚀预测的结合指纹的循环神经网络比较分析

J Chem Inf Model. 2025 Feb 10;65(3):1305-1317. doi: 10.1021/acs.jcim.4c02062. Epub 2025 Jan 21.

QSAR Classification Modeling Using Machine Learning with a Consensus-Based Approach for Multivariate Chemical Hazard End Points.使用机器学习并基于共识方法对多变量化学危害终点进行定量构效关系分类建模

ACS Omega. 2024 Dec 12;9(51):50796-50808. doi: 10.1021/acsomega.4c09356. eCollection 2024 Dec 24.

Prediction of human pharmacokinetic parameters incorporating SMILES information.结合SMILES信息预测人体药代动力学参数。

Arch Pharm Res. 2024 Dec;47(12):914-923. doi: 10.1007/s12272-024-01520-2. Epub 2024 Nov 26.

Integration of the Natural Language Processing of Structural Information Simplified Molecular-Input Line-Entry System Can Improve the In Vitro Prediction of Human Skin Sensitizers.结构信息简化分子输入线性输入系统的自然语言处理整合可改善对人类皮肤致敏剂的体外预测。

Toxics. 2024 Feb 16;12(2):153. doi: 10.3390/toxics12020153.

本文引用的文献

Artif Intell Life Sci. 2021 Dec;1. doi: 10.1016/j.ailsci.2021.100028. Epub 2021 Dec 5.

ChemSkin Reference Chemical Database for the Development of an In Vitro Skin Irritation Test.用于体外皮肤刺激性试验开发的ChemSkin参考化学数据库。

Toxics. 2021 Nov 18;9(11):314. doi: 10.3390/toxics9110314.

Machine learning random forest for predicting oncosomatic variant NGS analysis.机器学习随机森林预测肿瘤体细胞变异 NGS 分析。

Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.

XGBoost based machine learning approach to predict the risk of fall in older adults using gait outcomes.基于 XGBoost 的机器学习方法，利用步态结果预测老年人跌倒风险。

Sci Rep. 2021 Jun 9;11(1):12183. doi: 10.1038/s41598-021-91797-w.

Predicting the reproductive toxicity of chemicals using ensemble learning methods and molecular fingerprints.利用集成学习方法和分子指纹预测化学品的生殖毒性。

Toxicol Lett. 2021 Apr 1;340:4-14. doi: 10.1016/j.toxlet.2021.01.002. Epub 2021 Jan 6.

Linear Regression in Medical Research.医学研究中的线性回归

Anesth Analg. 2021 Jan;132(1):108-109. doi: 10.1213/ANE.0000000000005206.

Calculating Sensitivity, Specificity, and Predictive Values for Correlated Eye Data.计算相关眼部数据的灵敏度、特异性和预测值。

Invest Ophthalmol Vis Sci. 2020 Sep 1;61(11):29. doi: 10.1167/iovs.61.11.29.

Integrating in silico models and read-across methods for predicting toxicity of chemicals: A step-wise strategy.整合计算模型和读通方法预测化学品毒性：一种逐步策略。

Environ Int. 2019 Oct;131:105060. doi: 10.1016/j.envint.2019.105060. Epub 2019 Aug 1.

Neural network models and deep learning.神经网络模型与深度学习。

Curr Biol. 2019 Apr 1;29(7):R231-R236. doi: 10.1016/j.cub.2019.02.034.

Regression: The Apple Does Not Fall Far From the Tree.回归分析：龙生龙，凤生凤。

Anesth Analg. 2018 Jul;127(1):277-283. doi: 10.1213/ANE.0000000000003424.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验