Moin Emily E, Seewald Nicholas J, Halpern Scott D
Division of Pulmonary, Allergy, and Critical Care, University of Pennsylvania, Philadelphia.
Palliative and Advanced Illness Research (PAIR) Center, University of Pennsylvania, Philadelphia.
medRxiv. 2025 Mar 13:2025.03.12.25323846. doi: 10.1101/2025.03.12.25323846.
Height recorded in electronic health records (EHRs) is used extensively in diagnosis and treatment, either in isolation or as a component of body-mass index (BMI), but is often falsely high because many adults overestimate their height. Statistical models to predict height could therefore improve population health, but to date models have required extensive input and have not been externally validated.
We used the National Health and Nutrition Examination Survey (NHANES) to develop sex-stratified predictive models for examiner-measured height based on self-reported height and age in a random 90% sample of data. We internally validated the model in a held-out 10% sample and externally validated the model in two cohorts: The National Adolescent to Adult Longitudinal Health Study (Add Health) and the University of Michigan Health and Retirement Study (HRS). We assessed discrimination with C-index, calibration by visual inspection of calibration plots, and accuracy using root mean square error (RMSE).
Models were trained using 62,032 NHANES subjects (51.9% women, 21.7% Black, 23.9% Hispanic or Latino, with median age 48 [IQR 31 - 64]), and evaluated in the NHANES held-out test set (n=6,846), Add Health (n=5,749), and HRS (n=5,655). Models demonstrated excellent discrimination in all validation cohorts (C-index range 0.88 - 0.89). Models were well-calibrated in all validation cohorts. Model-predicted height demonstrated lower root mean square error (RMSE) compared to self-reported height in all validation cohorts and when stratified by race and ethnicity, with greatest improvements in participants aged 45 and over.
A model requiring minimal input data improves estimation of height over self-reported height at least as much as more complex models across stratifications of sex, age, race and ethnicity in internal validation, and is the first model to improve height estimation that has demonstrated external validity.
电子健康记录(EHR)中记录的身高在诊断和治疗中被广泛使用,既可以单独使用,也可以作为体重指数(BMI)的一个组成部分,但往往因许多成年人高估自己的身高而偏高。因此,预测身高的统计模型可能会改善人群健康状况,但迄今为止,这些模型需要大量输入数据且尚未经过外部验证。
我们利用国家健康与营养检查调查(NHANES),基于随机抽取的90%数据样本中的自我报告身高和年龄,为检查人员测量的身高建立性别分层预测模型。我们在留出的10%样本中对模型进行内部验证,并在两个队列中进行外部验证:全国青少年到成人纵向健康研究(Add Health)和密歇根大学健康与退休研究(HRS)。我们用C指数评估辨别力,通过校准图的目视检查评估校准情况,并用均方根误差(RMSE)评估准确性。
模型使用62,032名NHANES受试者(51.9%为女性,21.7%为黑人,23.9%为西班牙裔或拉丁裔,中位年龄48岁[IQR 31 - 64])进行训练,并在NHANES留出的测试集(n = 6,846)、Add Health(n = 5,749)和HRS(n = 5,655)中进行评估。模型在所有验证队列中均表现出出色的辨别力(C指数范围为0.88 -