Department of Health Outcomes and Biomedical Informatics, University of Florida School of Medicine, Gainesville, FL, USA.
Lifecourse Epidemiology of Adiposity and Diabetes Center, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
Sci Rep. 2024 Aug 6;14(1):18276. doi: 10.1038/s41598-024-69161-5.
Tracking trajectories of body size in children provides insight into chronic disease risk. One measure of pediatric body size is body mass index (BMI), a function of height and weight. Errors in measuring height or weight may lead to incorrect assessment of BMI. Yet childhood measures of height and weight extracted from electronic medical records often include values which seem biologically implausible in the context of a growth trajectory. Removing biologically implausible values reduces noise in the data, and thus increases the ease of modeling associations between exposures and childhood BMI trajectories, or between childhood BMI trajectories and subsequent health conditions. We developed open-source algorithms (available on github) for detecting and removing biologically implausible values in pediatric trajectories of height and weight. A Monte Carlo simulation experiment compared the sensitivity, specificity and speed of our algorithms to three published algorithms. The comparator algorithms were selected because they used trajectory information, had open-source code, and had published verification studies. Simulation inputs were derived from longitudinal epidemiological cohorts. Our algorithms had higher specificity, with similar sensitivity and speed, when compared to the three published algorithms. The results suggest that our algorithms should be adopted for cleaning longitudinal pediatric growth data.
追踪儿童身体尺寸的轨迹可以深入了解慢性病风险。衡量儿童身体尺寸的一个指标是体重指数(BMI),它是身高和体重的函数。身高或体重测量的误差可能导致 BMI 评估不准确。然而,从电子病历中提取的儿童身高和体重测量值通常包含在生长轨迹的背景下似乎不合理的数值。去除不合理的数值可以减少数据中的噪声,从而更容易建立暴露与儿童 BMI 轨迹之间的关联,或建立儿童 BMI 轨迹与随后的健康状况之间的关联。我们开发了用于检测和去除儿童身高和体重轨迹中不合理数值的开源算法(可在 github 上获得)。一项蒙特卡罗模拟实验比较了我们的算法与三种已发表算法的灵敏度、特异性和速度。选择比较算法是因为它们使用了轨迹信息,具有开源代码,并进行了已发表的验证研究。模拟输入源自纵向流行病学队列。与三种已发表的算法相比,我们的算法具有更高的特异性,同时具有相似的灵敏度和速度。结果表明,我们的算法应被采用来清理纵向儿科生长数据。