采用无监督机器学习方法鉴定和流行病学特征分析 2 型糖尿病亚人群。

Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach.

机构信息

Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany.

Leibniz-Institute for Food Systems Biology at the Technical University Munich, Munich, Germany.

出版信息

Nutr Diabetes. 2022 May 27;12(1):27. doi: 10.1038/s41387-022-00206-2.

DOI:10.1038/s41387-022-00206-2

PMID:35624098

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9142500/

Abstract

BACKGROUND

Studies on Type-2 Diabetes Mellitus (T2DM) have revealed heterogeneous sub-populations in terms of underlying pathologies. However, the identification of sub-populations in epidemiological datasets remains unexplored. We here focus on the detection of T2DM clusters in epidemiological data, specifically analysing the National Family Health Survey-4 (NFHS-4) dataset from India containing a wide spectrum of features, including medical history, dietary and addiction habits, socio-economic and lifestyle patterns of 10,125 T2DM patients.

METHODS

Epidemiological data provide challenges for analysis due to the diverse types of features in it. In this case, applying the state-of-the-art dimension reduction tool UMAP conventionally was found to be ineffective for the NFHS-4 dataset, which contains diverse feature types. We implemented a distributed clustering workflow combining different similarity measure settings of UMAP, for clustering continuous, ordinal and nominal features separately. We integrated the reduced dimensions from each feature-type-distributed clustering to obtain interpretable and unbiased clustering of the data.

RESULTS

Our analysis reveals four significant clusters, with two of them comprising mainly of non-obese T2DM patients. These non-obese clusters have lower mean age and majorly comprises of rural residents. Surprisingly, one of the obese clusters had 90% of the T2DM patients practising a non-vegetarian diet though they did not show an increased intake of plant-based protein-rich foods.

CONCLUSIONS

From a methodological perspective, we show that for diverse data types, frequent in epidemiological datasets, feature-type-distributed clustering using UMAP is effective as opposed to the conventional use of the UMAP algorithm. The application of UMAP-based clustering workflow for this type of dataset is novel in itself. Our findings demonstrate the presence of heterogeneity among Indian T2DM patients with regard to socio-demography and dietary patterns. From our analysis, we conclude that the existence of significant non-obese T2DM sub-populations characterized by younger age groups and economic disadvantage raises the need for different screening criteria for T2DM among rural Indian residents.

摘要

背景

关于 2 型糖尿病（T2DM）的研究揭示了其在潜在病理学方面存在异质亚群。然而，在流行病学数据集中识别亚群的问题仍未得到解决。我们专注于在流行病学数据中检测 T2DM 聚类，具体分析了来自印度的全国家庭健康调查-4（NFHS-4）数据集，该数据集包含广泛的特征，包括病史、饮食和成瘾习惯、社会经济和生活方式，共涉及 10125 名 T2DM 患者。

方法

由于流行病学数据中存在多种类型的特征，因此对其进行分析具有挑战性。在这种情况下，应用最先进的降维工具 UMAP 通常被发现对 NFHS-4 数据集无效，该数据集包含多种特征类型。我们实施了一种分布式聚类工作流程，结合了 UMAP 的不同相似性度量设置，分别对连续、有序和名义特征进行聚类。我们整合了来自每个特征类型分布式聚类的降维结果，以获得对数据的可解释和无偏聚类。

结果

我们的分析揭示了四个显著的聚类，其中两个主要由非肥胖的 T2DM 患者组成。这些非肥胖聚类的平均年龄较低，主要由农村居民组成。令人惊讶的是，肥胖聚类之一有 90%的 T2DM 患者是素食者，尽管他们并没有增加植物性蛋白质丰富的食物摄入。

结论

从方法论的角度来看，我们表明对于在流行病学数据集中常见的多种数据类型，使用 UMAP 的特征类型分布式聚类是有效的，而不是传统的 UMAP 算法的使用。基于 UMAP 的聚类工作流程在这种类型的数据集中的应用本身就是新颖的。我们的研究结果表明，印度 T2DM 患者在社会人口统计学和饮食模式方面存在异质性。从我们的分析中，我们得出结论，存在大量非肥胖的 T2DM 亚群，其特征是年龄较小的群体和经济劣势，这就需要为印度农村居民制定不同的 T2DM 筛查标准。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

采用无监督机器学习方法鉴定和流行病学特征分析 2 型糖尿病亚人群。

Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

采用无监督机器学习方法鉴定和流行病学特征分析 2 型糖尿病亚人群。

Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献