RTI International, Research Triangle Park, North Carolina 27709, United States.
Environmental Health Section, Division of Public Health, North Carolina Department of Health and Human Services, Raleigh, North Carolina 27609, United States.
Environ Sci Technol. 2023 Nov 21;57(46):17959-17970. doi: 10.1021/acs.est.2c07477. Epub 2023 Mar 18.
Tap water lead testing programs in the U.S. need improved methods for identifying high-risk facilities to optimize limited resources. In this study, machine-learned Bayesian network (BN) models were used to predict building-wide water lead risk in over 4,000 child care facilities in North Carolina according to maximum and 90th percentile lead levels from water lead concentrations at 22,943 taps. The performance of the BN models was compared to common alternative risk factors, or heuristics, used to inform water lead testing programs among child care facilities including building age, water source, and Head Start program status. The BN models identified a range of variables associated with building-wide water lead, with facilities that serve low-income families, rely on groundwater, and have more taps exhibiting greater risk. Models predicting the probability of a single tap exceeding each target concentration performed better than models predicting facilities with clustered high-risk taps. The BN models' F-scores outperformed each of the alternative heuristics by 118-213%. This represents up to a 60% increase in the number of high-risk facilities that could be identified and up to a 49% decrease in the number of samples that would need to be collected by using BN model-informed sampling compared to using simple heuristics. Overall, this study demonstrates the value of machine-learning approaches for identifying high water lead risk that could improve lead testing programs nationwide.
美国的自来水铅检测计划需要改进方法来识别高风险设施,以优化有限的资源。在这项研究中,根据 22943 个龙头的水中铅浓度,使用机器学习贝叶斯网络 (BN) 模型来预测北卡罗来纳州 4000 多家儿童保育设施的全楼用水铅风险,预测指标为最大和第 90 个百分位的水中铅含量。将 BN 模型的性能与儿童保育设施中用于告知水质铅检测计划的常见替代风险因素(或启发式方法)进行了比较,包括建筑年龄、水源和“先普计划”(Head Start program)状况。BN 模型确定了与全楼用水铅相关的一系列变量,为低收入家庭提供服务、依赖地下水且龙头数量较多的设施风险更大。预测单个龙头超过每个目标浓度的概率的模型表现优于预测具有集中高风险龙头的设施的模型。BN 模型的 F 分数比替代启发式方法高出 118-213%。这代表使用 BN 模型的知情抽样可以识别的高风险设施数量增加了 60%,需要采集的样本数量减少了 49%,与使用简单的启发式方法相比。总的来说,这项研究表明,机器学习方法在识别高水质铅风险方面具有价值,可以改善全美各地的水质铅检测计划。