Key Laboratory of Integrated Regulation and Resource Development on Shallow Lakes, Ministry of Education, College of Environment, Hohai University, Nanjing, 210098, China.
Key Laboratory of Integrated Regulation and Resource Development on Shallow Lakes, Ministry of Education, College of Environment, Hohai University, Nanjing, 210098, China; Guohe Environmental Research Institute (Nanjing) Co., Ltd, Nanjing, 211599, China.
Water Res. 2020 Oct 1;184:116103. doi: 10.1016/j.watres.2020.116103. Epub 2020 Jun 30.
Data-driven models are suitable for simulating biological wastewater treatment processes with complex intrinsic mechanisms. However, raw data collected in the early stage of biological experiments are normally not enough to train data-driven models. In this study, an integrated modeling approach incorporating the random standard deviation sampling (RSDS) method and deep neural networks (DNNs) models, was established to predict volatile fatty acid (VFA) production in the anaerobic fermentation process. The RSDS method based on the mean values (x¯) and standard deviations (α) calculated from multiple experimental determination was initially developed for virtual data augmentation. The DNNs models were then established to learn features from virtual data and predict VFA production. The results showed that when 20000 virtual samples including five input variables of the anaerobic fermentation process were used to train the DNNs model with 16 hidden layers and 100 hidden neurons in each layer, the best correlation coefficient of 0.998 and the minimal mean absolute percentage error of 3.28% were achieved. This integrated approach can learn nonlinear information from virtual data generated by the RSDS method, and consequently enlarge the application range of DNNs models in simulating biological wastewater treatment processes with small datasets.
数据驱动模型适用于模拟具有复杂内在机制的生物废水处理过程。然而,在生物实验的早期阶段收集的原始数据通常不足以训练数据驱动模型。在本研究中,建立了一种集成建模方法,该方法结合了随机标准偏差采样 (RSDS) 方法和深度神经网络 (DNN) 模型,用于预测厌氧发酵过程中挥发性脂肪酸 (VFA) 的产生。基于从多次实验测定中计算出的平均值 (x¯) 和标准偏差 (α) 的 RSDS 方法最初用于虚拟数据扩充。然后建立 DNNs 模型从虚拟数据中学习特征并预测 VFA 的产生。结果表明,当使用包括厌氧发酵过程的五个输入变量的 20000 个虚拟样本训练具有 16 个隐藏层和每个层 100 个隐藏神经元的 DNNs 模型时,达到了最佳的相关系数 0.998 和最小的平均绝对百分比误差 3.28%。这种集成方法可以从 RSDS 方法生成的虚拟数据中学习非线性信息,从而扩大 DNNs 模型在模拟具有小数据集的生物废水处理过程中的应用范围。