Fu Xianshu, Pan Xiangliang, Chen Jun, Zhang Mingzhou, Ye Zihong, Yu Xiaoping
College of Environment, Zhejiang University of Technology, Hangzhou 310032, China.
Zhejiang Provincial Key Laboratory of Biometrology and Inspection & Quarantine, College of Life Sciences, China Jiliang University, Hangzhou 310018, China.
Molecules. 2024 Mar 15;29(6):1308. doi: 10.3390/molecules29061308.
The pollution from waste plastic express packages (WPEPs), especially microplastic (MP) fragments, caused by the blowout development of the express delivery industry has attracted widespread attention. On account of the variety of additives, strong complexity, and high diversity of plastic express packages (PEPs), the multi-class classification of WPEPs is a typical large-class-number classification (LCNC). The traceability and identification of microplastic fragments from WPEPs is very challenging. An effective chemometric method for large-class-number classification would be very beneficial for the comprehensive treatment of WPEP pollution through the recycling and reuse of waste plastic express packages, including microplastic fragments and plastic debris. Rather than using the traditional one-against-one (OAO) and one-against-all (OAA) dichotomies, an exhaustive and parallel half-against-half (EPHAH) decomposition, which overcomes the defects of the OAO's classifier learning limitations and the OAA's data proportion imbalance, is proposed for feature selection. EPHAH analysis, combined with partial least squares discriminant analysis (PLS-DA) for large-class-number classification, was performed on 750 microplastic fragments of polyethylene WPEPs from 10 major courier companies using near-infrared (NIR) spectroscopy. After the removal of abnormal samples through robust principal component analysis (RPCA), the root mean square error of cross-validation (RMSECV) value for the model was reduced to 0.01, which was 21.5% lower than that including the abnormal samples. The best models of PLS-DA were obtained using SNV combined with SG-17 smoothing and 2D (SNV+SG-17+2D); the latent variables (LVs), the error rates of Monte Carlo cross-validation (ERMCCVs), and the final classification accuracies were 6.35, 0.155, and 88.67% for OAO-PLSDA; 5.37, 0.103, and 87.33% for OAA-PLSDA; and 3.12, 0.054, and 96.00% for EPHAH-PLSDA. The results showed that the EPHAH strategy can completely learn the complex LCNC decision boundaries for 10 classes, effectively break the tie problem, and greatly improve the voting resolution, thereby demonstrating significant superiority to both the OAO and OAA strategies in terms of classification accuracy. Meanwhile, PLS-DA further maximized the covariance and data interpretation abilities between the potential variables and categories of microplastic debris, thereby establishing an ideal performance identification model with a recognition rate of 96.00%.
快递行业的爆发式发展所导致的废弃塑料快递包装(WPEP)污染,尤其是微塑料(MP)碎片污染,已引起广泛关注。鉴于塑料快递包装(PEP)添加剂种类繁多、复杂性强且多样性高,WPEP的多类别分类是典型的大类数分类(LCNC)。从WPEP中追踪和识别微塑料碎片极具挑战性。一种有效的大类数分类化学计量方法对于通过回收和再利用废弃塑料快递包装(包括微塑料碎片和塑料残渣)来全面治理WPEP污染将非常有益。提出了一种详尽且并行的对半分解(EPHAH)方法用于特征选择,而不是使用传统的一对一(OAO)和一对多(OAA)二分法,该方法克服了OAO分类器学习局限性和OAA数据比例不平衡的缺陷。结合偏最小二乘判别分析(PLS - DA)进行大类数分类的EPHAH分析,使用近红外(NIR)光谱对来自10家主要快递公司的750个聚乙烯WPEP微塑料碎片进行了分析。通过稳健主成分分析(RPCA)去除异常样本后,模型的交叉验证均方根误差(RMSECV)值降至0.01,比包含异常样本时低21.5%。使用标准正态变量变换(SNV)结合Savitzky - Golay平滑17点(SG - 17)和二阶导数(2D)(SNV + SG - 17 + 2D)获得了PLS - DA的最佳模型;对于OAO - PLSDA方法,潜在变量(LVs)、蒙特卡罗交叉验证错误率(ERMCCVs)和最终分类准确率分别为6.35、0.155和88.67%;对于OAA - PLSDA方法,分别为5.37、0.103和87.33%;对于EPHAH - PLSDA方法,则分别为3.12、0.054和96.00%。结果表明,EPHAH策略能够完全学习10个类别的复杂LCNC决策边界,有效打破平局问题,并大大提高投票分辨率,从而在分类准确率方面显示出相对于OAO和OAA策略具有显著优势。同时,PLS - DA进一步最大化了微塑料残渣潜在变量与类别之间的协方差和数据解释能力,从而建立了一个识别率为96.00%的理想性能识别模型。