Materials Science and Engineering, North Carolina State University, Raleigh, North Carolina 27695, United States.
School of Sustainable Engineering & the Built Environment, Arizona State University, Tempe, Arizona 85287, United States.
Environ Sci Technol. 2024 Nov 19;58(46):20513-20524. doi: 10.1021/acs.est.4c05203. Epub 2024 Nov 7.
Accurately assessing and managing risks associated with inorganic pollutants in groundwater is imperative. Historic water quality databases are often sparse due to rationale or financial budgets for sample collection and analysis, posing challenges in evaluating exposure or water treatment effectiveness. We utilized and compared two advanced multiple data imputation techniques, AMELIA and MICE algorithms, to fill gaps in sparse groundwater quality data sets. AMELIA outperformed MICE in handling missing values, as MICE tended to overestimate certain values, resulting in more outliers. Field data sets revealed that 75% to 80% of samples exhibited no co-occurring regulated pollutants surpassing MCL values, whereas imputed values showed only 15% to 55% of the samples posed no health risks. Imputed data unveiled a significant increase, ranging from 2 to 5 times, in the number of sampling locations predicted to potentially exceed health-based limits and identified samples where 2 to 6 co-occurring chemicals may occur and surpass health-based levels. Linking imputed data to sampling locations can pinpoint potential hotspots of elevated chemical levels and guide optimal resource allocation for additional field sampling and chemical analysis. With this approach, further analysis of complete data sets allows state agencies authorized to conduct groundwater monitoring, often with limited financial resources, to prioritize sampling locations and chemicals to be tested. Given existing data and time constraints, it is crucial to identify the most strategic use of the available resources to address data gaps effectively. This work establishes a framework to enhance the beneficial impact of funding groundwater data collection by reducing uncertainty in prioritizing future sampling locations and chemical analyses.
准确评估和管理地下水无机污染物相关风险至关重要。由于采样和分析的合理或财务预算,历史水质数据库通常较为稀疏,这给评估暴露程度或水处理效果带来了挑战。我们利用并比较了两种先进的多数据插补技术,即 AMELIA 和 MICE 算法,以填补稀疏地下水质量数据集的空白。在处理缺失值方面,AMELIA 优于 MICE,因为 MICE 往往会高估某些值,从而产生更多的异常值。现场数据集显示,75%至 80%的样本没有同时存在超过 MCL 值的受监管污染物,而插补值显示只有 15%至 55%的样本没有健康风险。插补数据显示,预测可能超过基于健康的限制的采样点数量显著增加,范围从 2 倍到 5 倍,并确定了可能存在 2 到 6 种同时存在的化学物质并超过基于健康的水平的样本。将插补数据与采样点相关联,可以确定化学物质水平升高的潜在热点,并指导为额外的现场采样和化学分析分配最佳资源。通过这种方法,对完整数据集的进一步分析可以使授权进行地下水监测的州机构(通常资金有限)确定采样点和要测试的化学物质的优先级。考虑到现有数据和时间限制,必须确定如何最有效地利用现有资源,以有效地解决数据空白问题。这项工作建立了一个框架,通过减少未来采样点和化学分析的优先级中的不确定性,来提高地下水数据收集的有益影响。