Lin Pi-I D, Rifas-Shiman Sheryl L, Aris Izzuddin M, Daley Matthew F, Janicke David M, Heerman William J, Chudnov Daniel L, Freedman David S, Block Jason P
Division of Chronic Disease Research Across the Lifecourse (CoRAL), Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, USA.
Institute for Health Research, Kaiser Permanente Colorado, Aurora, Colorado, USA.
JAMIA Open. 2022 Nov 2;5(4):ooac089. doi: 10.1093/jamiaopen/ooac089. eCollection 2022 Dec.
To demonstrate the utility of , an anthropometric data cleaning method designed for electronic health records (EHR).
We used all available pediatric and adult height and weight data from an ongoing observational study that includes EHR data from 15 healthcare systems and applied to identify outliers and errors and compared its performance in pediatric data with 2 other pediatric data cleaning methods: (1) conditional percentile () and (2) PaEdiatric ANthropometric measurement Outlier Flagging pipeline ().
687 226 children (<20 years) and 3 267 293 adults contributed 71 246 369 weight and 51 525 487 height measurements. flagged 18% of pediatric and 12% of adult measurements for exclusion, mostly as carried-forward measures for pediatric data and duplicates for adult and pediatric data. After removing the flagged measurements, 0.5% and 0.6% of the pediatric heights and weights and 0.3% and 1.4% of the adult heights and weights, respectively, were biologically implausible according to the CDC and other established cut points. Compared with other pediatric cleaning methods, flagged the most measurements for exclusion; however, it did not flag some more extreme measurements. The prevalence of severe pediatric obesity was 9.0%, 9.2%, and 8.0% after cleaning by , , and , respectively.
is useful for cleaning pediatric and adult height and weight data. It is the only method with the ability to clean adult data and identify carried-forward and duplicates, which are prevalent in EHR. Findings of this study can be used to improve the algorithm.
证明一种为电子健康记录(EHR)设计的人体测量数据清理方法的实用性。
我们使用了一项正在进行的观察性研究中的所有可用儿科和成人身高及体重数据,该研究包括来自15个医疗系统的EHR数据,并应用该方法识别异常值和错误,并将其在儿科数据中的性能与其他两种儿科数据清理方法进行比较:(1)条件百分位数()和(2)儿科人体测量异常值标记管道()。
687226名儿童(<20岁)和3267293名成人贡献了71246369次体重测量和51525487次身高测量。该方法标记了18%的儿科测量值和12%的成人测量值以供排除,主要是作为儿科数据的结转测量值以及成人和儿科数据的重复项。在去除标记的测量值后,根据疾病控制与预防中心(CDC)和其他既定切点,分别有0.5%和0.6%的儿科身高和体重以及0.3%和1.4%的成人身高和体重在生物学上是不合理的。与其他儿科清理方法相比,该方法标记以供排除的测量值最多;然而,它没有标记一些更极端的测量值。分别采用该方法、和进行清理后,严重儿科肥胖的患病率分别为9.0%、9.2%和8.0%。
该方法对于清理儿科和成人身高及体重数据很有用。它是唯一能够清理成人数据并识别结转和重复项的方法,这些在EHR中很常见。本研究结果可用于改进该算法。