Chair of Hydrology and River Basin Management, Technical University of Munich, Arcisstrasse 21, 80333 Munich, Germany.
Sensors (Basel). 2023 Jun 30;23(13):6057. doi: 10.3390/s23136057.
Despite advancements in sensor technology, monitoring nutrients in situ and in real-time is still challenging and expensive. Soft sensors, based on data-driven models, offer an alternative to direct nutrient measurements. However, the high demand for data required for their development poses logistical issues with data handling. To address this, the study aimed to determine the optimal subset of predictors and the sampling frequency for developing nutrient soft sensors using random forest. The study used water quality data at 15-min intervals from 2 automatic stations on the Main River, Germany, and included dissolved oxygen, temperature, conductivity, pH, streamflow, and cyclical time features as predictors. The optimal subset of predictors was identified using forward subset selection, and the models fitted with the optimal predictors produced R values above 0.95 for nitrate, orthophosphate, and ammonium for both stations. The study then trained the models on 40 sampling frequencies, ranging from monthly to 15-min intervals. The results showed that as the sampling frequency increased, the model's performance, measured by RMSE, improved. The optimal balance between sampling frequency and model performance was identified using a knee-point determination algorithm. The optimal sampling frequency for nitrate was 3.6 and 2.8 h for the 2 stations, respectively. For orthophosphate, it was 2.4 and 1.8 h. For ammonium, it was 2.2 h for 1 station. The study highlights the utility of surrogate models for monitoring nutrient levels and demonstrates that nutrient soft sensors can function with fewer predictors at lower frequencies without significantly decreasing performance.
尽管传感器技术取得了进步,但原位和实时监测营养物质仍然具有挑战性和昂贵。基于数据驱动模型的软传感器为直接营养测量提供了一种替代方法。然而,它们的开发需要大量数据,这在数据处理方面带来了后勤问题。为了解决这个问题,本研究旨在使用随机森林确定开发营养软传感器的最佳预测因子子集和采样频率。本研究使用德国主要河流上的 2 个自动站以 15 分钟的间隔采集的水质数据,包括溶解氧、温度、电导率、pH 值、流量和周期性时间特征作为预测因子。使用前向子集选择确定最佳预测因子子集,并使用最佳预测因子拟合模型,两个站的硝酸盐、正磷酸盐和铵的 R 值均高于 0.95。然后,研究人员在 40 种采样频率(从每月到 15 分钟间隔)上对模型进行了训练。结果表明,随着采样频率的增加,模型的性能(以 RMSE 衡量)得到了提高。使用拐点确定算法确定了采样频率和模型性能之间的最佳平衡。硝酸盐的最佳采样频率分别为 2 个站的 3.6 和 2.8 小时。对于正磷酸盐,最佳采样频率分别为 2.4 和 1.8 小时。对于铵,一个站的最佳采样频率为 2.2 小时。本研究强调了替代模型在监测营养水平方面的实用性,并证明了营养软传感器可以在不显著降低性能的情况下,使用更少的预测因子和更低的频率运行。