Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
CRONICAS Centre of Excellence in Chronic Diseases, Universidad Peruana Cayetano Heredia, Lima, Peru.
BMJ Open Diabetes Res Care. 2021 Jan;9(1). doi: 10.1136/bmjdrc-2020-001889.
We aimed to identify clusters of people with type 2 diabetes mellitus (T2DM) and to assess whether the frequency of these clusters was consistent across selected countries in Latin America and the Caribbean (LAC).
We analyzed 13 population-based national surveys in nine countries (n=8361). We used k-means to develop a clustering model; predictors were age, sex, body mass index (BMI), waist circumference (WC), systolic/diastolic blood pressure (SBP/DBP), and T2DM family history. The training data set included all surveys, and the clusters were then predicted in each country-year data set. We used Euclidean distance, elbow and silhouette plots to select the optimal number of clusters and described each cluster according to the underlying predictors (mean and proportions).
The optimal number of clusters was 4. Cluster 0 grouped more men and those with the highest mean SBP/DBP. Cluster 1 had the highest mean BMI and WC, as well as the largest proportion of T2DM family history. We observed the smallest values of all predictors in cluster 2. Cluster 3 had the highest mean age. When we reflected the four clusters in each country-year data set, a different distribution was observed. For example, cluster 3 was the most frequent in the training data set, and so it was in 7 out of 13 other country-year data sets.
Using unsupervised machine learning algorithms, it was possible to cluster people with T2DM from the general population in LAC; clusters showed unique profiles that could be used to identify the underlying characteristics of the T2DM population in LAC.
我们旨在确定 2 型糖尿病(T2DM)患者的聚类,并评估这些聚类在拉丁美洲和加勒比地区(LAC)选定国家中的出现频率是否一致。
我们分析了九个国家的 13 项基于人群的全国性调查(n=8361)。我们使用 k-均值方法开发聚类模型;预测因子为年龄、性别、体重指数(BMI)、腰围(WC)、收缩压/舒张压(SBP/DBP)和 T2DM 家族史。训练数据集包括所有调查,然后在每个国家/年份的数据集预测聚类。我们使用欧几里得距离、肘形图和轮廓图来选择最佳聚类数,并根据潜在预测因子(均值和比例)描述每个聚类。
最佳聚类数为 4。聚类 0 包含更多男性和 SBP/DBP 平均值最高的人群。聚类 1 具有最高的 BMI 和 WC 平均值,以及最大比例的 T2DM 家族史。我们观察到聚类 2 的所有预测因子的最小值最小。聚类 3 的平均年龄最高。当我们在每个国家/年份的数据集反映四个聚类时,观察到了不同的分布。例如,聚类 3 在训练数据集中最为常见,因此在 13 个其他国家/年份的数据集中有 7 个也是如此。
使用无监督机器学习算法,可以对 LAC 一般人群中的 T2DM 患者进行聚类;聚类显示出独特的特征,可用于识别 LAC 中 T2DM 人群的潜在特征。