Chang Shih-Tsung, Chou Ying-Hsiang, Nfor Oswald Ndi, Zhong Ji-Han, Huang Chien-Ning, Liaw Yung-Po
Institute of Medicine, Chung Shan Medical University, Taichung City, Taiwan.
Department of Radiation Oncology, Chung Shan Medical University Hospital, Taichung City, Taiwan.
J Diabetes Res. 2025 May 27;2025:5531934. doi: 10.1155/jdr/5531934. eCollection 2025.
Type 2 diabetes (T2D) is influenced by lifestyle, genetics, and environmental conditions. By utilizing machine learning techniques, we can enhance the precision of T2D risk prediction by analyzing the complex interactions among these variables. This study was aimed at identifying and predicting key variables linked to T2D within the Taiwanese population. The study included 3623 individuals with T2D and 14,492 without. Data on lifestyle and anthropometric measures were obtained from the Taiwan Biobank. Statistical analyses were performed using Base SAS software and SAS Viya. Traditional models identified body mass index (BMI) and waist-hip ratio (WHR) as significant risk factors for T2D, with odds ratios (OR) of 1.10 (95% confidence interval (CI) 1.09-1.12) and 1.10 (95% CI 1.09-1.11), respectively. These variables remained crucial in predictive models, with the WHR being the most influential. In the overall population, BMI's relative importance was 0.57, differing by gender (0.23 in men and 0.62 in women). While cigarette smoking and certain genetic variants (, , , , , ) were significant in traditional models, their importance decreased in predictive models. Among various factors, the WHR emerged as the most critical attribute for T2D, underscoring the complexity of T2D etiology. Overall, the random forest and ensemble classifiers emerge as the most effective models, especially in mixed and female categories, highlighting their robustness in predictive performance.
2型糖尿病(T2D)受生活方式、遗传因素和环境条件的影响。通过运用机器学习技术,我们能够通过分析这些变量之间的复杂相互作用来提高T2D风险预测的准确性。本研究旨在识别和预测台湾人群中与T2D相关的关键变量。该研究纳入了3623名T2D患者和14492名非T2D患者。生活方式和人体测量数据来自台湾生物银行。使用Base SAS软件和SAS Viya进行统计分析。传统模型将体重指数(BMI)和腰臀比(WHR)确定为T2D的重要风险因素,优势比(OR)分别为1.10(95%置信区间(CI)1.09 - 1.12)和1.10(95% CI 1.09 - 1.11)。这些变量在预测模型中仍然至关重要,其中WHR的影响最大。在总体人群中,BMI的相对重要性为0.57,因性别而异(男性为0.23,女性为0.62)。虽然吸烟和某些基因变异(,,,,,)在传统模型中具有显著性,但它们在预测模型中的重要性有所下降。在各种因素中,WHR成为T2D最关键的属性,凸显了T2D病因的复杂性。总体而言,随机森林和集成分类器成为最有效的模型,尤其是在混合组和女性组中,突出了它们在预测性能方面的稳健性。