Kang Yeonsoo, Kim Myeong Gyu, Lim Kyung-Min
College of Pharmacy, Ewha Womans University, Seoul, 03760 Republic of Korea.
Toxicol Res. 2023 Jan 23;39(2):295-305. doi: 10.1007/s43188-022-00168-8. eCollection 2023 Apr.
Skin irritation test is an essential part of the safety assessment of chemicals. Recently, computational models to predict the skin irritation draw attention as alternatives to animal testing. We developed prediction models on skin irritation/corrosion of liquid chemicals using machine learning algorithms, with 34 physicochemical descriptors calculated from the structure. The training and test dataset of 545 liquid chemicals with reliable in vivo skin hazard classifications based on UN Globally Harmonized System [category 1 (corrosive, Cat 1), 2 (irritant, Cat 2), 3 (mild irritant, Cat 3), and no category (nonirritant, NC)] were collected from public databases. After the curation of input data through removal and correlation analysis, every model was constructed to predict skin hazard classification for liquid chemicals with 22 physicochemical descriptors. Seven machine learning algorithms [Logistic regression, Naïve Bayes, k-nearest neighbor, Support vector machine, Random Forest, Extreme gradient boosting (XGB), and Neural net] were applied to ternary and binary classification of skin hazard. XGB model demonstrated the highest accuracy (0.73-0.81), sensitivity (0.71-0.92), and positive predictive value (0.65-0.81). The contribution of physicochemical descriptors to the classification was analyzed using Shapley Additive exPlanations plot to provide an insight into the skin irritation of chemicals.
The online version contains supplementary material available at 10.1007/s43188-022-00168-8.
皮肤刺激性试验是化学品安全性评估的重要组成部分。最近,预测皮肤刺激性的计算模型作为动物试验的替代方法受到关注。我们使用机器学习算法开发了关于液体化学品皮肤刺激/腐蚀的预测模型,从结构计算出34个物理化学描述符。从公共数据库收集了545种基于联合国全球协调系统具有可靠体内皮肤危害分类的液体化学品的训练和测试数据集[类别1(腐蚀性,Cat 1)、2(刺激性,Cat 2)、3(轻度刺激性,Cat 3)和无类别(无刺激性,NC)]。通过去除和相关性分析对输入数据进行整理后,构建了每个模型,以使用22个物理化学描述符预测液体化学品的皮肤危害分类。将七种机器学习算法[逻辑回归、朴素贝叶斯、k近邻、支持向量机、随机森林、极端梯度提升(XGB)和神经网络]应用于皮肤危害的三元和二元分类。XGB模型表现出最高的准确率(0.73 - 0.81)、灵敏度(0.71 - 0.92)和阳性预测值(0.65 - 0.81)。使用Shapley加性解释图分析了物理化学描述符对分类的贡献,以深入了解化学品的皮肤刺激性。
在线版本包含可在10.1007/s43188-022-00168-8获取的补充材料。