Suppr超能文献

基于塑料光谱学的数据增强与分类算法比较

Comparison of data augmentation and classification algorithms based on plastic spectroscopy.

作者信息

Luo Jiachao, Wu Qunbiao, Cao Jin, Fang Haifeng, Xu Chenyang, He Defang

机构信息

School of Mechanical Engineering, Jiangsu University of Science and Technology, Jiangsu, 212100, China.

Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China.

出版信息

Anal Methods. 2025 Feb 6;17(6):1236-1251. doi: 10.1039/d4ay01759e.

Abstract

Plastic waste management is one of the key issues in global environmental protection. Integrating spectroscopy acquisition devices with deep learning algorithms has emerged as an effective method for rapid plastic classification. However, the challenges in collecting plastic samples and spectroscopy data have resulted in a limited number of data samples and an incomplete comparison of relevant classification algorithms. To address this issue, we propose a plastic spectroscopy generation model and conduct a systematic analysis and comparison of different algorithms' performance from multiple perspectives, based on data augmentation. This paper first performs cubic interpolation, normalization, S-G filtering, linear detrending, and standard normal variate (SNV) transformations as preprocessing methods on plastic spectral data collected from public datasets using techniques such as Fourier Transform Infrared Spectroscopy (FTIR), Raman Spectroscopy (RAMAN), and Laser Induced Breakdown Spectroscopy (LIBS). The results, based on Principal Component Analysis (PCA) visualization, demonstrate that the preprocessing steps help improve classification accuracy. Additionally, PCA loading is used to explain the chemical classification features of each spectral device. Secondly, to tackle the issue of insufficient sample size, we propose a plastic spectroscopy generation model based on C-GAN, which effectively handles multi-class spectroscopy generation. The generated spectra are subjectively validated through difference spectroscopy and t-SNE to confirm their consistency with real spectra, and this conclusion is objectively validated using Maximum Mean Discrepancy (MMD). Finally, we compared the classification accuracy of machine learning algorithms, including Support Vector Machine (SVM), Back Propagation Neural Network (BP), K-Nearest Neighbors (KNN), Random Forest (RF), and Decision Tree (DT), with deep learning algorithms such as GoogleNet and ResNet under various conditions. The results indicate that after data augmentation using the plastic spectrum generation model, the accuracy of each classification model improved by at least 3% compared to pre-augmentation levels. Notably, for data collected FTIR, the classification accuracy reached a peak of 0.991 under the 1D-ResNet model when the data were augmented twofold. For small sample datasets, traditional machine learning algorithms, such as SVM and RF, demonstrated high stability and accuracy, with only minimal differences compared to deep learning algorithms. However, on large sample datasets, deep learning algorithms showed a stronger advantage. Regarding data input formats, 1D input models generally outperformed 2D input models. Grad-CAM visualizations further illustrated that the 1D-ResNet model achieved the highest classification accuracy, primarily due to its ability to more accurately identify peak features in the data.

摘要

塑料垃圾管理是全球环境保护中的关键问题之一。将光谱采集设备与深度学习算法相结合已成为一种快速塑料分类的有效方法。然而,塑料样本和光谱数据收集方面的挑战导致数据样本数量有限,且相关分类算法的比较并不完整。为解决这一问题,我们提出了一种塑料光谱生成模型,并基于数据增强从多个角度对不同算法的性能进行了系统分析和比较。本文首先对使用傅里叶变换红外光谱(FTIR)、拉曼光谱(RAMAN)和激光诱导击穿光谱(LIBS)等技术从公共数据集中收集的塑料光谱数据进行三次样条插值、归一化、S-G滤波、线性去趋势和标准正态变量(SNV)变换等预处理。基于主成分分析(PCA)可视化的结果表明,这些预处理步骤有助于提高分类准确率。此外,PCA载荷用于解释每个光谱设备的化学分类特征。其次,为解决样本量不足的问题,我们提出了一种基于C-GAN的塑料光谱生成模型,该模型有效地处理了多类光谱生成。通过差分光谱和t-SNE对生成的光谱进行主观验证,以确认其与真实光谱的一致性,并使用最大均值差异(MMD)进行客观验证。最后,我们比较了机器学习算法(包括支持向量机(SVM)、反向传播神经网络(BP)、K近邻(KNN)、随机森林(RF)和决策树(DT))与深度学习算法(如GoogleNet和ResNet)在各种条件下的分类准确率。结果表明,使用塑料光谱生成模型进行数据增强后,每个分类模型的准确率相比增强前至少提高了3%。值得注意的是,对于FTIR收集的数据,在1D-ResNet模型下数据增强两倍时,分类准确率达到了0.991的峰值。对于小样本数据集,传统机器学习算法(如SVM和RF)表现出高稳定性和准确率,与深度学习算法相比差异极小。然而,在大样本数据集上,深度学习算法显示出更强的优势。关于数据输入格式,1D输入模型通常优于2D输入模型。Grad-CAM可视化进一步表明,1D-ResNet模型实现了最高的分类准确率,主要是因为它能够更准确地识别数据中的峰值特征。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验