Gu Yu, Preisser John S, Zeng Donglin, Shrestha Poojan, Shah Molina, Simancas-Pallares Miguel A, Ginnis Jeannie, Divaris Kimon
Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill.
Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina at Chapel Hill.
Ann Appl Stat. 2022 Mar;16(1):551-572. doi: 10.1214/21-aoas1516. Epub 2022 Mar 28.
Community water fluoridation is an important component of oral health promotion, as fluoride exposure is a well-documented dental caries-preventive agent. Direct measurements of domestic water fluoride content provide valuable information regarding individuals' fluoride exposure and thus caries risk; however, they are logistically challenging to carry out at a large scale in oral health research. This article describes the development and evaluation of a novel method for the imputation of missing domestic water fluoride concentration data informed by spatial autocorrelation. The context is a state-wide epidemiologic study of pediatric oral health in North Carolina, where domestic water fluoride concentration information was missing for approximately 75% of study participants with clinical data on dental caries. A new machine-learning-based imputation method that combines partitioning around medoids clustering and random forest classification (PAMRF) is developed and implemented. Imputed values are filtered according to allowable error rates or target sample size, depending on the requirements of each application. In leave-one-out cross-validation and simulation studies, PAMRF outperforms four existing imputation approaches-two conventional spatial interpolation methods (i.e., inverse-distance weighting, IDW and universal kriging, UK) and two supervised learning methods (-nearest neighbors, KNN and classification and regression trees, CART). The inclusion of multiply imputed values in the estimation of the association between fluoride concentration and dental caries prevalence resulted in essentially no change in PAMRF estimates but substantial gains in precision due to larger effective sample size. PAMRF is a powerful new method for the imputation of missing fluoride values where geographical information exists.
社区水氟化是口腔健康促进的重要组成部分,因为氟暴露是一种有充分文献记载的预防龋齿的因素。直接测量家庭用水中的氟含量可提供有关个人氟暴露情况以及龋齿风险的宝贵信息;然而,在口腔健康研究中大规模开展此类测量在后勤方面具有挑战性。本文介绍了一种基于空间自相关性来估算缺失的家庭用水氟浓度数据的新方法的开发与评估。背景是北卡罗来纳州一项关于儿童口腔健康的全州范围的流行病学研究,在该研究中,约75%有龋齿临床数据的研究参与者缺少家庭用水氟浓度信息。开发并实施了一种基于机器学习的新估算方法,该方法结合了围绕中心点的划分聚类和随机森林分类(PAMRF)。根据每个应用的要求,根据允许误差率或目标样本量对估算值进行筛选。在留一法交叉验证和模拟研究中,PAMRF优于四种现有的估算方法——两种传统的空间插值方法(即反距离加权法,IDW和通用克里金法,UK)以及两种监督学习方法(K近邻法,KNN和分类与回归树法,CART)。在氟浓度与龋齿患病率之间关联的估计中纳入多重估算值,PAMRF估计值基本没有变化,但由于有效样本量增大,精度有显著提高。PAMRF是一种用于估算存在地理信息时缺失氟值的强大新方法。