Suppr超能文献

采用无监督机器学习方法鉴定和流行病学特征分析 2 型糖尿病亚人群。

Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach.

机构信息

Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany.

Leibniz-Institute for Food Systems Biology at the Technical University Munich, Munich, Germany.

出版信息

Nutr Diabetes. 2022 May 27;12(1):27. doi: 10.1038/s41387-022-00206-2.

Abstract

BACKGROUND

Studies on Type-2 Diabetes Mellitus (T2DM) have revealed heterogeneous sub-populations in terms of underlying pathologies. However, the identification of sub-populations in epidemiological datasets remains unexplored. We here focus on the detection of T2DM clusters in epidemiological data, specifically analysing the National Family Health Survey-4 (NFHS-4) dataset from India containing a wide spectrum of features, including medical history, dietary and addiction habits, socio-economic and lifestyle patterns of 10,125 T2DM patients.

METHODS

Epidemiological data provide challenges for analysis due to the diverse types of features in it. In this case, applying the state-of-the-art dimension reduction tool UMAP conventionally was found to be ineffective for the NFHS-4 dataset, which contains diverse feature types. We implemented a distributed clustering workflow combining different similarity measure settings of UMAP, for clustering continuous, ordinal and nominal features separately. We integrated the reduced dimensions from each feature-type-distributed clustering to obtain interpretable and unbiased clustering of the data.

RESULTS

Our analysis reveals four significant clusters, with two of them comprising mainly of non-obese T2DM patients. These non-obese clusters have lower mean age and majorly comprises of rural residents. Surprisingly, one of the obese clusters had 90% of the T2DM patients practising a non-vegetarian diet though they did not show an increased intake of plant-based protein-rich foods.

CONCLUSIONS

From a methodological perspective, we show that for diverse data types, frequent in epidemiological datasets, feature-type-distributed clustering using UMAP is effective as opposed to the conventional use of the UMAP algorithm. The application of UMAP-based clustering workflow for this type of dataset is novel in itself. Our findings demonstrate the presence of heterogeneity among Indian T2DM patients with regard to socio-demography and dietary patterns. From our analysis, we conclude that the existence of significant non-obese T2DM sub-populations characterized by younger age groups and economic disadvantage raises the need for different screening criteria for T2DM among rural Indian residents.

摘要

背景

关于 2 型糖尿病(T2DM)的研究揭示了其在潜在病理学方面存在异质亚群。然而,在流行病学数据集中识别亚群的问题仍未得到解决。我们专注于在流行病学数据中检测 T2DM 聚类,具体分析了来自印度的全国家庭健康调查-4(NFHS-4)数据集,该数据集包含广泛的特征,包括病史、饮食和成瘾习惯、社会经济和生活方式,共涉及 10125 名 T2DM 患者。

方法

由于流行病学数据中存在多种类型的特征,因此对其进行分析具有挑战性。在这种情况下,应用最先进的降维工具 UMAP 通常被发现对 NFHS-4 数据集无效,该数据集包含多种特征类型。我们实施了一种分布式聚类工作流程,结合了 UMAP 的不同相似性度量设置,分别对连续、有序和名义特征进行聚类。我们整合了来自每个特征类型分布式聚类的降维结果,以获得对数据的可解释和无偏聚类。

结果

我们的分析揭示了四个显著的聚类,其中两个主要由非肥胖的 T2DM 患者组成。这些非肥胖聚类的平均年龄较低,主要由农村居民组成。令人惊讶的是,肥胖聚类之一有 90%的 T2DM 患者是素食者,尽管他们并没有增加植物性蛋白质丰富的食物摄入。

结论

从方法论的角度来看,我们表明对于在流行病学数据集中常见的多种数据类型,使用 UMAP 的特征类型分布式聚类是有效的,而不是传统的 UMAP 算法的使用。基于 UMAP 的聚类工作流程在这种类型的数据集中的应用本身就是新颖的。我们的研究结果表明,印度 T2DM 患者在社会人口统计学和饮食模式方面存在异质性。从我们的分析中,我们得出结论,存在大量非肥胖的 T2DM 亚群,其特征是年龄较小的群体和经济劣势,这就需要为印度农村居民制定不同的 T2DM 筛查标准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3cc/9142500/833de99da4a1/41387_2022_206_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验