Monash Centre for Health Research and Implementation, School of Public Health and Preventive Medicine, Faculty of Medicine, Nursing, and Health Sciences, Monash University, Clayton, Australia.
Biostatistics Unit, Division of Research Methodology, School of Public Health and Preventive Medicine, Faculty of Medicine, Nursing, and Health Sciences, Monash University, Melbourne, Australia.
PLoS One. 2021 May 5;16(5):e0250832. doi: 10.1371/journal.pone.0250832. eCollection 2021.
Using a nationally-representative, cross-sectional cohort, we examined nutritional markers of undiagnosed type 2 diabetes in adults via machine learning.
A total of 16429 men and non-pregnant women ≥ 20 years of age were analysed from five consecutive cycles of the National Health and Nutrition Examination Survey. Cohorts from years 2013-2016 (n = 6673) was used for external validation. Undiagnosed type 2 diabetes was determined by a negative response to the question "Have you ever been told by a doctor that you have diabetes?" and a positive glycaemic response to one or more of the three diagnostic tests (HbA1c > 6.4% or FPG >125 mg/dl or 2-hr post-OGTT glucose > 200mg/dl). Following comprehensive literature search, 114 potential nutritional markers were modelled with 13 behavioural and 12 socio-economic variables. We tested three machine learning algorithms on original and resampled training datasets built using three resampling methods. From this, the derived 12 predictive models were validated on internal- and external validation cohorts. Magnitudes of associations were gauged through odds ratios in logistic models and variable importance in others. Models were benchmarked against the ADA diabetes risk test.
The prevalence of undiagnosed type 2 diabetes was 5.26%. Four best-performing models (AUROC range: 74.9%-75.7%) classified 39 markers of undiagnosed type 2 diabetes; 28 via one or more of the three best-performing non-linear/ensemble models and 11 uniquely by the logistic model. They comprised 14 nutrient-based, 12 anthropometry-based, 9 socio-behavioural, and 4 diet-associated markers. AUROC of all models were on a par with ADA diabetes risk test on both internal and external validation cohorts (p>0.05).
Models performed comparably to the chosen benchmark. Novel behavioural markers such as the number of meals not prepared from home were revealed. This approach may be useful in nutritional epidemiology to unravel new associations with type 2 diabetes.
本研究使用具有全国代表性的横断面队列,通过机器学习方法研究成年人中未经诊断的 2 型糖尿病的营养标志物。
本研究共纳入了连续五次全国健康与营养调查(NHANES)中年龄≥20 岁的 16429 名男性和非孕妇。来自 2013-2016 年的队列(n=6673)用于外部验证。未经诊断的 2 型糖尿病的确定依据为医生诊断为糖尿病的回答为“否”以及一项或多项三种诊断测试(糖化血红蛋白(HbA1c)>6.4%、空腹血糖(FPG)>125mg/dl 或 2 小时口服葡萄糖耐量试验(OGTT)后血糖>200mg/dl)阳性。在全面的文献检索后,使用 13 项行为和 12 项社会经济变量对 114 种潜在的营养标志物进行建模。我们在使用三种重采样方法构建的原始和重采样训练数据集中测试了三种机器学习算法。在此基础上,内部和外部验证队列验证了由此产生的 12 个预测模型。逻辑模型中的比值比和其他模型中的变量重要性用于衡量关联程度。模型与美国糖尿病协会(ADA)糖尿病风险测试进行了基准比较。
未经诊断的 2 型糖尿病的患病率为 5.26%。表现最佳的四个模型(AUROC 范围:74.9%-75.7%)对 39 种未经诊断的 2 型糖尿病标志物进行了分类,其中 28 种通过三种表现最佳的非线性/集成模型中的一种或多种进行分类,11 种则通过逻辑模型进行分类。它们包括 14 种营养素标志物、12 种人体测量学标志物、9 种社会行为学标志物和 4 种饮食相关标志物。所有模型在内部和外部验证队列中的 AUROC 均与 ADA 糖尿病风险测试相当(p>0.05)。
模型的表现与选定的基准相当。揭示了新的行为标志物,例如不在家准备的餐数。这种方法可能有助于营养流行病学揭示与 2 型糖尿病相关的新关联。