Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
JMIR Public Health Surveill. 2024 Jul 2;10:e53330. doi: 10.2196/53330.
The prevalence of type 2 diabetes mellitus (DM) and pre-diabetes mellitus (pre-DM) has been increasing among youth in recent decades in the United States, prompting an urgent need for understanding and identifying their associated risk factors. Such efforts, however, have been hindered by the lack of easily accessible youth pre-DM/DM data.
We aimed to first build a high-quality, comprehensive epidemiological data set focused on youth pre-DM/DM. Subsequently, we aimed to make these data accessible by creating a user-friendly web portal to share them and the corresponding codes. Through this, we hope to address this significant gap and facilitate youth pre-DM/DM research.
Building on data from the National Health and Nutrition Examination Survey (NHANES) from 1999 to 2018, we cleaned and harmonized hundreds of variables relevant to pre-DM/DM (fasting plasma glucose level ≥100 mg/dL or glycated hemoglobin ≥5.7%) for youth aged 12-19 years (N=15,149). We identified individual factors associated with pre-DM/DM risk using bivariate statistical analyses and predicted pre-DM/DM status using our Ensemble Integration (EI) framework for multidomain machine learning. We then developed a user-friendly web portal named Prediabetes/diabetes in youth Online Dashboard (POND) to share the data and codes.
We extracted 95 variables potentially relevant to pre-DM/DM risk organized into 4 domains (sociodemographic, health status, diet, and other lifestyle behaviors). The bivariate analyses identified 27 significant correlates of pre-DM/DM (P<.001, Bonferroni adjusted), including race or ethnicity, health insurance, BMI, added sugar intake, and screen time. Among these factors, 16 factors were also identified based on the EI methodology (Fisher P of overlap=7.06×10). In addition to those, the EI approach identified 11 additional predictive variables, including some known (eg, meat and fruit intake and family income) and less recognized factors (eg, number of rooms in homes). The factors identified in both analyses spanned across all 4 of the domains mentioned. These data and results, as well as other exploratory tools, can be accessed on POND.
Using NHANES data, we built one of the largest public epidemiological data sets for studying youth pre-DM/DM and identified potential risk factors using complementary analytical approaches. Our results align with the multifactorial nature of pre-DM/DM with correlates across several domains. Also, our data-sharing platform, POND, facilitates a wide range of applications to inform future youth pre-DM/DM studies.
近年来,美国青少年 2 型糖尿病(DM)和糖尿病前期(pre-DM)的患病率呈上升趋势,因此迫切需要了解和识别其相关的危险因素。然而,由于缺乏易于获取的青少年 pre-DM/DM 数据,此类工作受到了阻碍。
我们首先构建了一个高质量、综合性的流行病学数据集,重点关注青少年 pre-DM/DM。随后,我们通过创建一个用户友好的网络门户来共享这些数据和相应的代码,使这些数据易于获取。通过这种方式,我们希望解决这一重大差距,促进青少年 pre-DM/DM 的研究。
基于 1999 年至 2018 年国家健康和营养检查调查(NHANES)的数据,我们清理并协调了数百个与 pre-DM/DM 相关的变量(空腹血糖水平≥100mg/dL 或糖化血红蛋白≥5.7%),用于年龄在 12-19 岁的青少年(N=15149)。我们使用双变量统计分析识别与 pre-DM/DM 风险相关的个体因素,并使用我们的多维机器学习集成整合(EI)框架预测 pre-DM/DM 状态。然后,我们开发了一个名为青少年 pre-DM/DM 在线仪表板(POND)的用户友好型网络门户,用于共享数据和代码。
我们提取了 95 个可能与 pre-DM/DM 风险相关的变量,这些变量组织成 4 个领域(社会人口统计学、健康状况、饮食和其他生活方式行为)。双变量分析确定了 27 个与 pre-DM/DM 显著相关的因素(P<.001,Bonferroni 调整),包括种族或民族、医疗保险、BMI、添加糖摄入量和屏幕时间。在这些因素中,基于 EI 方法还确定了 16 个因素(Fisher 重叠 P=7.06×10)。此外,EI 方法还确定了 11 个额外的预测变量,包括一些已知的因素(如肉类和水果摄入量以及家庭收入)和不太被认可的因素(如家庭房间数量)。在这两种分析中确定的因素跨越了上述所有 4 个领域。这些数据和结果以及其他探索性工具可以在 POND 上获取。
使用 NHANES 数据,我们构建了一个研究青少年 pre-DM/DM 的最大公共流行病学数据集之一,并使用互补的分析方法确定了潜在的危险因素。我们的结果与 pre-DM/DM 的多因素性质一致,与多个领域的相关因素一致。此外,我们的数据共享平台 POND 促进了广泛的应用,为未来的青少年 pre-DM/DM 研究提供了信息。