Nafii Ayoub, Lamane Houda, Taleb Abdeslam, El Bilali Ali
Hassan II University of Casablanca, Faculty of sciences and techniques of Mohammedia, Morocco.
River Basin Agency of Bouregreg and Chaouia, 13000 Benslimane, Morocco.
MethodsX. 2023 Feb 2;10:102034. doi: 10.1016/j.mex.2023.102034. eCollection 2023.
Machine Learning models have become a fruitful tool in water resources modelling. However, it requires a significant amount of datasets for training and validation, which poses challenges in the analysis of data scarce environments, particularly for poorly monitored basins. In such scenarios, using Virtual Sample Generation (VSG) method is valuable to overcome this challenge in developing ML models. The main aim of this manuscript is to introduce a novel VSG based on multivariate distribution and Gaussian Copula called MVD-VSG whereby appropriate virtual combinations of groundwater quality parameters can be generated to train Deep Neural Network (DNN) for predicting Entropy Weighted Water Quality Index (EWQI) of aquifers even with small datasets. The MVD-VSG is original and was validated for its initial application using sufficient observed datasets collected from two aquifers. The validation results showed that from only 20 original samples, the MVD-VSG provided enough accuracy to predict EWQI with an NSE of 0.87. However the companion publication of this Method paper is El Bilali et al. [1]. •Development of MVD-VSG to generate virtual combinations of groundwater parameters in data scarce environment.•Training deep neural network to predict groundwater quality.•Validation of the method with sufficient observed datasets and sensitivity analysis.
机器学习模型已成为水资源建模中一种卓有成效的工具。然而,它需要大量数据集用于训练和验证,这在数据稀缺环境的分析中带来了挑战,尤其是对于监测不足的流域。在这种情况下,使用虚拟样本生成(VSG)方法对于克服开发机器学习模型中的这一挑战很有价值。本文的主要目的是介绍一种基于多元分布和高斯Copula的新型虚拟样本生成方法,称为MVD-VSG,通过该方法可以生成合适的地下水质量参数虚拟组合,用于训练深度神经网络(DNN),即使在数据集较小的情况下也能预测含水层的熵权水质指数(EWQI)。MVD-VSG是原创的,并使用从两个含水层收集的足够观测数据集对其初始应用进行了验证。验证结果表明,仅从20个原始样本中,MVD-VSG就能提供足够的准确性来预测EWQI,NSE为0.87。然而,本方法论文的配套出版物是El Bilali等人[1]。•开发MVD-VSG以在数据稀缺环境中生成地下水参数的虚拟组合。•训练深度神经网络以预测地下水质量。•用足够的观测数据集对该方法进行验证并进行敏感性分析。