School of Science, Nantong University, Nantong, 226019, China.
Bioelectromagnetics Laboratory, and Department of Reproductive Endocrinology of Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China.
Comput Methods Programs Biomed. 2023 Jun;235:107537. doi: 10.1016/j.cmpb.2023.107537. Epub 2023 Apr 5.
Increasing and compelling evidence has been proved that urinary and dietary metal exposure are underappreciated but potentially modifiable biomarkers for type 2 diabetes mellitus (T2DM). The aims of this study were (1) to identify the key potential biomarkers which contributed to T2DM with effective and parsimonious features and (2) to assess the utility of baseline variables and metal exposure in the diagnosis of T2DM.
Based on the National Health and Nutrition Examination Survey (NHANES), we selected 9822 screening records with 82 significant variables covering demographics, lifestyle, anthropometric measures, diet and metal exposure for this study. Combining extreme gradient boosting (XGBoost), random forest and light gradient boosting machine (lightGBM), a soft voting ensemble model was proposed to measure the importance of 82 features. With this soft voting ensemble model and variance inflation factor (VIF), strong multicollinear features with low importance scores were further removed from candidate biomarkers. Then, a soft voting ensemble classifier was adopted to demonstrate the efficiency of the proposed feature selection method.
With the novel feature selection method, 12 baseline variables and 3 metal variables were selected to detect patients at risk for T2DM in our study. For metal variables, the dietary copper (Cu), urinary cadmium (Cd) and urinary mercury (Hg) metals were selected as the most remarkable metal exposure and the corresponding P-values were all less than 0.05. In a classification model of T2DM with 12 baseline biomarkers, the addition of 3 metal exposure improved the classification accuracy of T2DM from a traditional area under the curve (AUC) 0.792 of the receiver operating characteristic (ROC) to an AUC 0.847.
This was the first demonstration of T2DM classification with machine learning under urinary and dietary metal exposure. Improved prediction precision illustrated the effectiveness of the proposed machine learning-based diagnosis model facilitated lifestyle/dietary intervention for T2DM prevention.
越来越多的证据表明,尿液和饮食中的金属暴露是被低估但具有潜在可改变性的 2 型糖尿病(T2DM)生物标志物。本研究的目的是:(1)确定与 T2DM 相关的关键潜在生物标志物,这些标志物具有有效性和简约性特征;(2)评估基线变量和金属暴露在 T2DM 诊断中的作用。
基于全国健康和营养检查调查(NHANES),我们选择了 9822 份筛查记录,这些记录包含了 82 个重要变量,涵盖了人口统计学、生活方式、人体测量学指标、饮食和金属暴露等方面。本研究采用极端梯度提升(XGBoost)、随机森林和轻梯度提升机(lightGBM)相结合的方法,提出了一种软投票集成模型来衡量 82 个特征的重要性。利用这种软投票集成模型和方差膨胀因子(VIF),我们进一步从候选生物标志物中剔除了具有低重要性得分的强多重共线性特征。然后,采用软投票集成分类器来展示所提出的特征选择方法的效率。
利用新的特征选择方法,本研究从 82 个基线变量中选择了 12 个基线变量和 3 个金属变量来检测 T2DM 高危患者。对于金属变量,膳食铜(Cu)、尿镉(Cd)和尿汞(Hg)被选为最显著的金属暴露,相应的 P 值均小于 0.05。在一个包含 12 个基线生物标志物的 T2DM 分类模型中,增加 3 种金属暴露可将 T2DM 的分类准确率从传统的接收者操作特征(ROC)曲线下面积(AUC)0.792提高到 0.847。
这是首次在尿液和饮食金属暴露下利用机器学习对 T2DM 进行分类的演示。提高预测精度说明了所提出的基于机器学习的诊断模型的有效性,有助于对 T2DM 进行生活方式/饮食干预以预防疾病。