Onthoni Djeane Debora, Chen Ying-Erh, Lai Yi-Hsuan, Li Guo-Hung, Zhuang Yong-Sheng, Lin Hong-Ming, Hsiao Yu-Ping, Onthoni Ade Indra, Chiou Hung-Yi, Chung Ren-Hua
Institute of Population Health Sciences, National Health Research Institutes, Miaoli County, Taiwan.
Department of Risk Management and Insurance, Tamkang University, New Taipei City, Taiwan.
J Diabetes Investig. 2025 Jan;16(1):25-35. doi: 10.1111/jdi.14328. Epub 2024 Oct 10.
AIMS/INTRODUCTION: This study aimed to identify low- and high-risk diabetes groups within prediabetes populations using data from the Taiwan Biobank (TWB) and UK Biobank (UKB) through a clustering-based Unsupervised Learning (UL) approach, to inform targeted type 2 diabetes (T2D) interventions.
Data from TWB and UKB, comprising clinical and genetic information, were analyzed. Prediabetes was defined by glucose thresholds, and incident T2D was identified through follow-up data. K-means clustering was performed on prediabetes participants using significant features determined through logistic regression and LASSO. Cluster stability was assessed using mean Jaccard similarity, silhouette score, and the elbow method.
We identified two stable clusters representing high- and low-risk diabetes groups in both biobanks. The high-risk clusters showed higher diabetes incidence, with 15.7% in TWB and 13.0% in UKB, compared to 7.3% and 9.1% in the low-risk clusters, respectively. Notably, males were predominant in the high-risk groups, constituting 76.6% in TWB and 52.7% in UKB. In TWB, the high-risk group also exhibited significantly higher BMI, fasting glucose, and triglycerides, while UKB showed marginal significance in BMI and other metabolic indicators. Current smoking was significantly associated with increased diabetes risk in the TWB high-risk group (P < 0.001). Kaplan-Meier curves indicated significant differences in diabetes complication incidences between clusters.
UL effectively identified risk-specific groups within prediabetes populations, with high-risk groups strongly associated male gender, higher BMI, smoking, and metabolic markers. Tailored preventive strategies, particularly for young males in Taiwan, are crucial to reducing T2D risk.
目的/引言:本研究旨在利用台湾生物银行(TWB)和英国生物银行(UKB)的数据,通过基于聚类的无监督学习(UL)方法,在糖尿病前期人群中识别低风险和高风险糖尿病组,以为针对性的2型糖尿病(T2D)干预提供信息。
分析了来自TWB和UKB的数据,包括临床和遗传信息。糖尿病前期由血糖阈值定义,通过随访数据确定新发T2D。使用通过逻辑回归和LASSO确定的显著特征,对糖尿病前期参与者进行K均值聚类。使用平均杰卡德相似度、轮廓系数和肘部方法评估聚类稳定性。
我们在两个生物银行中均识别出代表高风险和低风险糖尿病组的两个稳定聚类。高风险聚类的糖尿病发病率更高,TWB中为15.7%,UKB中为13.0%,而低风险聚类中分别为7.3%和9.1%。值得注意的是,高风险组中男性占主导,TWB中占76.6%,UKB中占52.7%。在TWB中,高风险组的BMI、空腹血糖和甘油三酯也显著更高,而UKB在BMI和其他代谢指标方面显示出边际显著性。当前吸烟与TWB高风险组中糖尿病风险增加显著相关(P < 0.001)。 Kaplan-Meier曲线表明聚类之间糖尿病并发症发生率存在显著差异。
UL有效地在糖尿病前期人群中识别出风险特异性组,高风险组与男性性别、更高的BMI、吸烟和代谢标志物密切相关。制定针对性的预防策略,特别是针对台湾的年轻男性,对于降低T2D风险至关重要。