• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

采用无监督机器学习方法鉴定和流行病学特征分析 2 型糖尿病亚人群。

Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach.

机构信息

Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany.

Leibniz-Institute for Food Systems Biology at the Technical University Munich, Munich, Germany.

出版信息

Nutr Diabetes. 2022 May 27;12(1):27. doi: 10.1038/s41387-022-00206-2.

DOI:10.1038/s41387-022-00206-2
PMID:35624098
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9142500/
Abstract

BACKGROUND

Studies on Type-2 Diabetes Mellitus (T2DM) have revealed heterogeneous sub-populations in terms of underlying pathologies. However, the identification of sub-populations in epidemiological datasets remains unexplored. We here focus on the detection of T2DM clusters in epidemiological data, specifically analysing the National Family Health Survey-4 (NFHS-4) dataset from India containing a wide spectrum of features, including medical history, dietary and addiction habits, socio-economic and lifestyle patterns of 10,125 T2DM patients.

METHODS

Epidemiological data provide challenges for analysis due to the diverse types of features in it. In this case, applying the state-of-the-art dimension reduction tool UMAP conventionally was found to be ineffective for the NFHS-4 dataset, which contains diverse feature types. We implemented a distributed clustering workflow combining different similarity measure settings of UMAP, for clustering continuous, ordinal and nominal features separately. We integrated the reduced dimensions from each feature-type-distributed clustering to obtain interpretable and unbiased clustering of the data.

RESULTS

Our analysis reveals four significant clusters, with two of them comprising mainly of non-obese T2DM patients. These non-obese clusters have lower mean age and majorly comprises of rural residents. Surprisingly, one of the obese clusters had 90% of the T2DM patients practising a non-vegetarian diet though they did not show an increased intake of plant-based protein-rich foods.

CONCLUSIONS

From a methodological perspective, we show that for diverse data types, frequent in epidemiological datasets, feature-type-distributed clustering using UMAP is effective as opposed to the conventional use of the UMAP algorithm. The application of UMAP-based clustering workflow for this type of dataset is novel in itself. Our findings demonstrate the presence of heterogeneity among Indian T2DM patients with regard to socio-demography and dietary patterns. From our analysis, we conclude that the existence of significant non-obese T2DM sub-populations characterized by younger age groups and economic disadvantage raises the need for different screening criteria for T2DM among rural Indian residents.

摘要

背景

关于 2 型糖尿病(T2DM)的研究揭示了其在潜在病理学方面存在异质亚群。然而,在流行病学数据集中识别亚群的问题仍未得到解决。我们专注于在流行病学数据中检测 T2DM 聚类,具体分析了来自印度的全国家庭健康调查-4(NFHS-4)数据集,该数据集包含广泛的特征,包括病史、饮食和成瘾习惯、社会经济和生活方式,共涉及 10125 名 T2DM 患者。

方法

由于流行病学数据中存在多种类型的特征,因此对其进行分析具有挑战性。在这种情况下,应用最先进的降维工具 UMAP 通常被发现对 NFHS-4 数据集无效,该数据集包含多种特征类型。我们实施了一种分布式聚类工作流程,结合了 UMAP 的不同相似性度量设置,分别对连续、有序和名义特征进行聚类。我们整合了来自每个特征类型分布式聚类的降维结果,以获得对数据的可解释和无偏聚类。

结果

我们的分析揭示了四个显著的聚类,其中两个主要由非肥胖的 T2DM 患者组成。这些非肥胖聚类的平均年龄较低,主要由农村居民组成。令人惊讶的是,肥胖聚类之一有 90%的 T2DM 患者是素食者,尽管他们并没有增加植物性蛋白质丰富的食物摄入。

结论

从方法论的角度来看,我们表明对于在流行病学数据集中常见的多种数据类型,使用 UMAP 的特征类型分布式聚类是有效的,而不是传统的 UMAP 算法的使用。基于 UMAP 的聚类工作流程在这种类型的数据集中的应用本身就是新颖的。我们的研究结果表明,印度 T2DM 患者在社会人口统计学和饮食模式方面存在异质性。从我们的分析中,我们得出结论,存在大量非肥胖的 T2DM 亚群,其特征是年龄较小的群体和经济劣势,这就需要为印度农村居民制定不同的 T2DM 筛查标准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3cc/9142500/d0070473d78d/41387_2022_206_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3cc/9142500/833de99da4a1/41387_2022_206_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3cc/9142500/ea5ca77fde9d/41387_2022_206_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3cc/9142500/d0070473d78d/41387_2022_206_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3cc/9142500/833de99da4a1/41387_2022_206_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3cc/9142500/ea5ca77fde9d/41387_2022_206_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3cc/9142500/d0070473d78d/41387_2022_206_Fig3_HTML.jpg

相似文献

1
Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach.采用无监督机器学习方法鉴定和流行病学特征分析 2 型糖尿病亚人群。
Nutr Diabetes. 2022 May 27;12(1):27. doi: 10.1038/s41387-022-00206-2.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Clusters of people with type 2 diabetes in the general population: unsupervised machine learning approach using national surveys in Latin America and the Caribbean.一般人群中 2 型糖尿病患者聚类:使用拉丁美洲和加勒比地区国家调查的无监督机器学习方法。
BMJ Open Diabetes Res Care. 2021 Jan;9(1). doi: 10.1136/bmjdrc-2020-001889.
4
Unsupervised machine learning based on clinical factors for the detection of coronary artery atherosclerosis in type 2 diabetes mellitus.基于临床因素的无监督机器学习在 2 型糖尿病患者冠状动脉粥样硬化检测中的应用。
Cardiovasc Diabetol. 2022 Nov 28;21(1):259. doi: 10.1186/s12933-022-01700-8.
5
Results from a dietary survey in an Indian T2DM population: a STARCH study.一项针对印度2型糖尿病患者群体的饮食调查结果:一项淀粉研究。
BMJ Open. 2014 Oct 31;4(10):e005138. doi: 10.1136/bmjopen-2014-005138.
6
Distinct pathoclinical clusters among patients with uncontrolled type 2 diabetes: results from a prospective study in rural India.在未控制的 2 型糖尿病患者中存在不同的病理临床聚类:来自印度农村的一项前瞻性研究结果。
BMJ Open Diabetes Res Care. 2022 Feb;10(1). doi: 10.1136/bmjdrc-2021-002654.
7
DGCyTOF: Deep learning with graphic cluster visualization to predict cell types of single cell mass cytometry data.DGCyTOF:基于图形聚类可视化的深度学习,用于预测单细胞质谱流式细胞术数据的细胞类型。
PLoS Comput Biol. 2022 Apr 11;18(4):e1008885. doi: 10.1371/journal.pcbi.1008885. eCollection 2022 Apr.
8
A machine learning-based framework to identify type 2 diabetes through electronic health records.一种基于机器学习的通过电子健康记录识别2型糖尿病的框架。
Int J Med Inform. 2017 Jan;97:120-127. doi: 10.1016/j.ijmedinf.2016.09.014. Epub 2016 Oct 1.
9
Community participatory learning and action cycle groups to reduce type 2 diabetes in Bangladesh (D:Clare trial): study protocol for a stepped-wedge cluster randomised controlled trial.孟加拉国社区参与式学习与行动循环小组以减少2型糖尿病(D:克莱尔试验):一项阶梯式楔形整群随机对照试验的研究方案
Trials. 2021 Mar 29;22(1):235. doi: 10.1186/s13063-021-05167-y.
10
Prevalence of type 2 diabetes mellitus (T2DM) in India: A systematic review (1994-2018).印度 2 型糖尿病(T2DM)的患病率:系统评价(1994-2018 年)。
Diabetes Metab Syndr. 2020 Sep-Oct;14(5):897-906. doi: 10.1016/j.dsx.2020.05.040. Epub 2020 May 30.

引用本文的文献

1
Diabetic peripheral neuropathy detection of type 2 diabetes using machine learning from TCM features: a cross-sectional study.基于中医特征运用机器学习检测2型糖尿病患者的糖尿病周围神经病变:一项横断面研究
BMC Med Inform Decis Mak. 2025 Feb 18;25(1):90. doi: 10.1186/s12911-025-02932-w.
2
Identification of key factors for malnutrition diagnosis in chronic gastrointestinal diseases using machine learning underscores the importance of GLIM criteria as well as additional parameters.利用机器学习识别慢性胃肠疾病营养不良诊断的关键因素,突出了全球营养不良领导倡议(GLIM)标准以及其他参数的重要性。
Front Nutr. 2024 Dec 12;11:1479501. doi: 10.3389/fnut.2024.1479501. eCollection 2024.
3

本文引用的文献

1
Novel subgroups of type 2 diabetes and their association with microvascular outcomes in an Asian Indian population: a data-driven cluster analysis: the INSPIRED study.2型糖尿病的新型亚组及其与亚洲印度人群微血管结局的关联:一项数据驱动的聚类分析:INSPIRED研究
BMJ Open Diabetes Res Care. 2020 Aug;8(1). doi: 10.1136/bmjdrc-2020-001506.
2
Integrative Cluster Analysis of Whole Hearts Reveals Proliferative Cardiomyocytes in Adult Mice.整体心脏的综合聚类分析揭示成年小鼠中的增殖性心肌细胞。
Cells. 2020 May 6;9(5):1144. doi: 10.3390/cells9051144.
3
Impaired compensatory hyperinsulinemia among nonobese type 2 diabetes patients: a cross-sectional study.
Challenges and applications in generative AI for clinical tabular data in physiology.
生理学临床表格数据生成式人工智能中的挑战与应用
Pflugers Arch. 2025 Apr;477(4):531-542. doi: 10.1007/s00424-024-03024-w. Epub 2024 Oct 17.
4
Bitter peptide prediction using graph neural networks.使用图神经网络进行苦味肽预测。
J Cheminform. 2024 Oct 7;16(1):111. doi: 10.1186/s13321-024-00909-x.
5
Optimization of diabetes prediction methods based on combinatorial balancing algorithm.基于组合平衡算法的糖尿病预测方法优化。
Nutr Diabetes. 2024 Aug 14;14(1):63. doi: 10.1038/s41387-024-00324-z.
6
Clinical characteristics and complication risks in data-driven clusters among Chinese community diabetes populations.中国社区糖尿病患者数据驱动聚类中的临床特征和并发症风险。
J Diabetes. 2024 Aug;16(8):e13596. doi: 10.1111/1753-0407.13596.
7
Performance analysis of markers for prostate cell typing in single-cell data.单细胞数据中前列腺细胞分型标志物的性能分析
Genes Dis. 2023 Oct 26;11(6):101157. doi: 10.1016/j.gendis.2023.101157. eCollection 2024 Nov.
8
Invited commentary: deep learning-methods to amplify epidemiologic data collection and analyses.特邀评论:深度学习方法助力扩大流行病学数据收集与分析
Am J Epidemiol. 2025 Feb 5;194(2):322-326. doi: 10.1093/aje/kwae215.
9
Integrated modeling of labile and glycated hemoglobin with glucose for enhanced diabetes detection and short-term monitoring.结合不稳定血红蛋白和糖化血红蛋白与葡萄糖进行综合建模,以加强糖尿病检测和短期监测。
iScience. 2024 Mar 1;27(4):109369. doi: 10.1016/j.isci.2024.109369. eCollection 2024 Apr 19.
10
Dietary patterns associated with the incidence of hypertension among adult Japanese males: application of machine learning to a cohort study.饮食习惯与成年日本男性高血压发病率的关系:基于队列研究的机器学习应用。
Eur J Nutr. 2024 Jun;63(4):1293-1314. doi: 10.1007/s00394-024-03342-w. Epub 2024 Feb 25.
非肥胖型2型糖尿病患者代偿性高胰岛素血症受损:一项横断面研究。
Ther Adv Endocrinol Metab. 2019 Dec 2;10:2042018819889024. doi: 10.1177/2042018819889024. eCollection 2019.
4
The art of using t-SNE for single-cell transcriptomics.使用 t-SNE 进行单细胞转录组学分析的艺术。
Nat Commun. 2019 Nov 28;10(1):5416. doi: 10.1038/s41467-019-13056-x.
5
UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts.UMAP 揭示了大型基因组队列中的隐藏种群结构和表型异质性。
PLoS Genet. 2019 Nov 1;15(11):e1008432. doi: 10.1371/journal.pgen.1008432. eCollection 2019 Nov.
6
Evaluation of Nutritional Status and Allostatic Load in Adult Patients With Type 2 Diabetes.评估 2 型糖尿病成年患者的营养状况和全身负荷。
Can J Diabetes. 2020 Mar;44(2):156-161. doi: 10.1016/j.jcjd.2019.05.011. Epub 2019 May 28.
7
Real-world evidence of glycemic control among patients with type 2 diabetes mellitus in India: the TIGHT study.印度 2 型糖尿病患者血糖控制的真实世界证据:TIGHT 研究。
BMJ Open Diabetes Res Care. 2019 Jul 14;7(1):e000654. doi: 10.1136/bmjdrc-2019-000654. eCollection 2019.
8
Disease progression and treatment response in data-driven subgroups of type 2 diabetes compared with models based on simple clinical features: an analysis using clinical trial data.基于临床试验数据的分析:与基于简单临床特征的模型相比,数据驱动的 2 型糖尿病亚组的疾病进展和治疗反应。
Lancet Diabetes Endocrinol. 2019 Jun;7(6):442-451. doi: 10.1016/S2213-8587(19)30087-7. Epub 2019 Apr 29.
9
Identification of novel population clusters with different susceptibilities to type 2 diabetes and their impact on the prediction of diabetes.鉴定具有不同 2 型糖尿病易感性的新型人群聚类及其对糖尿病预测的影响。
Sci Rep. 2019 Mar 4;9(1):3329. doi: 10.1038/s41598-019-40058-y.
10
Diabetes Among Non-Overweight Individuals: an Emerging Public Health Challenge.非超重人群中的糖尿病:一个新出现的公共卫生挑战。
Curr Diab Rep. 2018 Jul 4;18(8):60. doi: 10.1007/s11892-018-1017-1.