Suppr超能文献

用于塑料包装相关化学品毒性评估的基于分类的机器学习模型的开发。

The development of classification-based machine-learning models for the toxicity assessment of chemicals associated with plastic packaging.

作者信息

Hossain Md Mobarak, Roy Kunal

机构信息

Drug Theoretics and Cheminformatics (DTC) Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.

Drug Theoretics and Cheminformatics (DTC) Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.

出版信息

J Hazard Mater. 2025 Feb 15;484:136702. doi: 10.1016/j.jhazmat.2024.136702. Epub 2024 Nov 30.

Abstract

Assessing chemical toxicity in materials like plastic packaging is critical to safeguarding public health. This study presents the development of classification-based machine learning models to predict the toxicity of chemicals associated with plastic packaging. Using an extensive dataset of chemical structures, we trained multiple machine learning models-Random Forest, Support Vector Machine, Linear Discriminant Analysis, and Logistic Regression-targeting endpoints such as Neurotoxicity, Hepatotoxicity, Dermatotoxicity, Carcinogenicity, Reproductive Toxicity, Skin Sensitization, and Toxic Pneumonitis. The dataset was pre-processed by selecting 2D molecular descriptors as feature inputs, with resampling methods (ADASYN, Borderline SMOTE, Random Over-sampler, SVMSMOTE Cluster Centroid, Near Miss, Random Under Sampler) applied to balance classes for accurate classification. A five-fold cross-validation technique was used to optimize model performance, with model parameters fine-tuned using grid search. The model performance was evaluated using accuracy (Acc), sensitivity (Se), specificity (Sp), and area under the receiver operating characteristic curve (AUC-ROC) metrics. In most of the cases, the model accuracy was 0.8 or above for both training and test sets. Additionally, SHAP (SHapley Additive exPlanations) values were utilized for feature importance analysis, highlighting significant descriptors contributing to toxicity predictions. The models were ranked using the Sum of Ranking Differences (SRD) method to systematically select the most effective model. The optimal models demonstrated high predictive accuracy and interpretability, providing a scalable and efficient solution for toxicity assessment compared to traditional methods. This approach offers a valuable tool for rapidly screening potentially hazardous chemicals in plastic packaging.

摘要

评估塑料包装等材料中的化学毒性对于保障公众健康至关重要。本研究提出了基于分类的机器学习模型的开发,以预测与塑料包装相关的化学物质的毒性。利用广泛的化学结构数据集,我们针对神经毒性、肝毒性、皮肤毒性、致癌性、生殖毒性、皮肤致敏性和中毒性肺炎等终点,训练了多个机器学习模型——随机森林、支持向量机、线性判别分析和逻辑回归。通过选择二维分子描述符作为特征输入对数据集进行预处理,并应用重采样方法(ADASYN、边界合成少数类过采样技术、随机过采样器、支持向量机合成少数类过采样技术聚类中心、近邻缺失、随机欠采样器)来平衡类别以进行准确分类。使用五折交叉验证技术优化模型性能,并使用网格搜索对模型参数进行微调。使用准确率(Acc)、灵敏度(Se)、特异性(Sp)和受试者工作特征曲线下面积(AUC-ROC)指标评估模型性能。在大多数情况下,训练集和测试集的模型准确率均达到0.8或以上。此外,利用SHAP(SHapley加性解释)值进行特征重要性分析,突出了对毒性预测有贡献的重要描述符。使用排名差异总和(SRD)方法对模型进行排名,以系统地选择最有效的模型。与传统方法相比,最优模型具有较高的预测准确性和可解释性,为毒性评估提供了一种可扩展且高效的解决方案。这种方法为快速筛选塑料包装中潜在的有害化学物质提供了一个有价值的工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验