Geleijnse Jan, Rutten Martine, de Villiers Didier, Bamwenda James Tayebwa, Abraham Edo
Department of Water Management, Delft University of Technology, Mekelweg, 2628 CD, Delft, The Netherlands.
UNICEF, Nairobi, Kenya.
Sci Rep. 2023 Aug 18;13(1):13433. doi: 10.1038/s41598-023-39917-6.
Monitoring safe water access in developing countries relies primarily on household health survey and census data. These surveys are often incomplete: they tend to focus on the primary water source only, are spatially coarse, and usually happen every 5-10 years, during which significant changes can happen in urbanisation and infrastructure provision, especially in sub Saharan Africa. In this work, we present a data-driven approach that utilises and compliments survey based data of water access, to provide context-specific and disaggregated monitoring. The level of access to improved water and sanitation has been shown to vary with geographical inequalities related to the availability of water resources and terrain, population density and socio-economic determinants such as income and education. We use such data and successfully predict the level of water access in areas for which data is lacking, providing spatially explicit and community level monitoring possibilities for mapping geographical inequalities in access. This is showcased by applying three machine learning models that use such geographical data to predict the number of presences of water access points of eight different access types across Uganda, with a 1km by 1km grid resolution. Two Multi-Layer-Perceptron (MLP) models and a Maximum Entropy (MaxEnt) model are developed and compared, where the former are shown to consistently outperform the latter. The best performing Neural Network model achieved a True Positive Rate of 0.89 and a False Positive Rate of 0.24, compared to 0.85 and 0.46 respectively for the MaxEnt model. The models improve on previous work on water point modeling through the use of neural networks, in addition to introducing the True Positive - and False Positive Rate as better evaluation metrics to also assess the MaxEnt model. We also present a scaling method to move from predicting only the relative probability of water point presences, to predicting the absolute number of presences. To challenge both the model results and the more standard health surveys, a new household level survey is carried out in Bushenyi, a mid-sized town in the South-West of Uganda, asking specifically about the multitude of water sources. On average Bushenyi households reported to use 1.9 water sources. The survey further showed that the actual presence of a source, does not always imply that it is used. Therefore it is no option to rely solely on models for water access monitoring. For this, household surveys remain necessary but should be extended with questions on the multiple sources that are used by households.
在发展中国家,监测安全用水获取情况主要依赖于家庭健康调查和人口普查数据。这些调查往往并不完整:它们往往仅关注主要水源,空间分辨率较低,且通常每5至10年进行一次,在此期间,城市化和基础设施建设可能会发生重大变化,尤其是在撒哈拉以南非洲地区。在这项工作中,我们提出了一种数据驱动的方法,该方法利用并补充基于调查的用水获取数据,以提供特定背景下的分类监测。改善水和卫生设施的获取水平已被证明会因与水资源可用性、地形、人口密度以及收入和教育等社会经济决定因素相关的地理不平等而有所不同。我们利用这些数据成功预测了缺乏数据地区的用水获取水平,为绘制获取方面的地理不平等提供了空间明确且社区层面的监测可能性。这通过应用三种机器学习模型得以展示,这些模型利用此类地理数据预测乌干达各地八种不同获取类型的取水点存在数量,网格分辨率为1公里×1公里。开发并比较了两个多层感知器(MLP)模型和一个最大熵(MaxEnt)模型,结果表明前两者始终优于后者。表现最佳的神经网络模型的真阳性率为0.89,假阳性率为0.24,而MaxEnt模型的真阳性率和假阳性率分别为0.85和0.46。这些模型除了引入真阳性率和假阳性率作为更好的评估指标来评估MaxEnt模型外,还通过使用神经网络改进了先前关于取水点建模的工作。我们还提出了一种缩放方法,从仅预测取水点存在的相对概率转变为预测存在的绝对数量。为了检验模型结果以及更标准的健康调查,在乌干达西南部的一个中型城镇布申伊开展了一项新的家庭层面调查,特别询问了多种水源的情况。布申伊家庭平均报告使用1.9个水源。该调查进一步表明,水源的实际存在并不总是意味着它被使用。因此,不能仅依靠模型进行用水获取监测。为此,家庭调查仍然必要,但应扩展关于家庭使用的多种水源的问题。