Xing Zheng, Chen Junying, Zhao Xiao, Li Yu, Li Xianwen, Zhang Zhitao, Lao Congcong, Wang Haifeng
Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Yangling, Shaanxi, China.
College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling, Shaanxi, China.
PeerJ. 2019 Dec 12;7:e8255. doi: 10.7717/peerj.8255. eCollection 2019.
Water pollution has been hindering the world's sustainable development. The accurate inversion of water quality parameters in sewage with visible-near infrared spectroscopy can improve the effectiveness and rational utilization and management of water resources. However, the accuracy of spectral models of water quality parameters is usually prone to noise information and high dimensionality of spectral data. This study aimed to enhance the model accuracy through optimizing the spectral models based on the sensitive spectral intervals of different water quality parameters. To this end, six kinds of sewage water taken from a biological sewage treatment plant went through laboratory physical and chemical tests. In total, 87 samples of sewage water were obtained by adding different amount of pure water to them. The raw reflectance (R) of the samples were collected with analytical spectral devices. The R were obtained from the R processed with the standard normal variable. Then, the sensitive spectral intervals of each of the six water quality parameters, namely, chemical oxygen demand (COD), biological oxygen demand (BOD), NH-N, the total dissolved substances (TDS), total hardness (TH) and total alkalinity (TA), were selected using three different methods: gray correlation (GC), variable importance in projection (VIP) and set pair analysis (SPA). Finally, the performance of both extreme learning machine (ELM) and partial least squares regression (PLSR) was investigated based on the sensitive spectral intervals. The results demonstrated that the model accuracy based on the sensitive spectral ranges screened through different methods appeared different. The GC method had better performance in reducing the redundancy and the VIP method was better in information preservation. The SPA method could make the optimal trade-offs between information preservation and redundancy reduction and it could retain maximal spectral band intervals with good response to the inversion parameters. The accuracy of the models based on varied sensitive spectral ranges selected by the three analysis methods was different: the GC was the highest, the SPA came next and the VIP was the lowest. On the whole, PLSR and ELM both achieved satisfying model accuracy, but the prediction accuracy of the latter was higher than the former. Great differences existed among the optimal inversion accuracy of different water quality parameters: COD, BOD and TN were very high; TA relatively high; and TDS and TH relatively low. These findings can provide a new way to optimize the spectral model of wastewater biochemical parameters and thus improve its prediction precision.
水污染一直阻碍着世界的可持续发展。利用可见-近红外光谱准确反演污水中的水质参数,可以提高水资源利用与管理的有效性和合理性。然而,水质参数光谱模型的准确性通常容易受到噪声信息和光谱数据高维性的影响。本研究旨在通过基于不同水质参数的敏感光谱区间优化光谱模型来提高模型精度。为此,采集了某生物污水处理厂的六种污水进行实验室理化测试。通过向污水中添加不同量的纯水,共获得87个污水样本。使用分析光谱设备采集样本的原始反射率(R)。对经过标准正态变量处理后的R进行分析得到R。然后,采用灰色关联度(GC)、投影变量重要性(VIP)和集对分析(SPA)三种不同方法,选取化学需氧量(COD)、生化需氧量(BOD)、氨氮(NH-N)、总溶解固体(TDS)、总硬度(TH)和总碱度(TA)这六种水质参数各自的敏感光谱区间。最后,基于敏感光谱区间研究了极限学习机(ELM)和偏最小二乘回归(PLSR)的性能。结果表明,基于不同方法筛选出的敏感光谱范围所建立的模型精度有所不同。GC方法在减少冗余方面表现较好,VIP方法在信息保留方面表现较好。SPA方法能够在信息保留和冗余减少之间实现最佳权衡,并且能够保留对反演参数响应良好的最大光谱带区间。三种分析方法选择的不同敏感光谱范围所建立的模型精度不同:GC最高,SPA次之,VIP最低。总体而言,PLSR和ELM均取得了令人满意的模型精度,但后者的预测精度高于前者。不同水质参数的最佳反演精度存在很大差异:COD、BOD和TN非常高;TA相对较高;TDS和TH相对较低。这些研究结果可为优化废水生化参数光谱模型、提高其预测精度提供新途径。