• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用机器学习识别肥胖人群中的聚类:马斯特里赫特研究的二次分析

Identification of Clusters in a Population With Obesity Using Machine Learning: Secondary Analysis of The Maastricht Study.

作者信息

Beuken Maik Jm, Kleynen Melanie, Braun Susy, Van Berkel Kees, van der Kallen Carla, Koster Annemarie, Bosma Hans, Berendschot Tos Tjm, Houben Alfons Jhm, Dukers-Muijrers Nicole, van den Bergh Joop P, Kroon Abraham A, Kanera Iris M

机构信息

Faculty of Financial Management, Research Center for Statistics & Data Science, Zuyd University of Applied Sciences, Sittard, Netherlands.

Faculty of Health, School of Physiotherapy, Research Center for Nutrition, Lifestyle and Exercise, Zuyd University of Applied Sciences, Heerlen, Netherlands.

出版信息

JMIR Med Inform. 2025 Feb 5;13:e64479. doi: 10.2196/64479.

DOI:10.2196/64479
PMID:39908080
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11840370/
Abstract

BACKGROUND

Modern lifestyle risk factors, like physical inactivity and poor nutrition, contribute to rising rates of obesity and chronic diseases like type 2 diabetes and heart disease. Particularly personalized interventions have been shown to be effective for long-term behavior change. Machine learning can be used to uncover insights without predefined hypotheses, revealing complex relationships and distinct population clusters. New data-driven approaches, such as the factor probabilistic distance clustering algorithm, provide opportunities to identify potentially meaningful clusters within large and complex datasets.

OBJECTIVE

This study aimed to identify potential clusters and relevant variables among individuals with obesity using a data-driven and hypothesis-free machine learning approach.

METHODS

We used cross-sectional data from individuals with abdominal obesity from The Maastricht Study. Data (2971 variables) included demographics, lifestyle, biomedical aspects, advanced phenotyping, and social factors (cohort 2010). The factor probabilistic distance clustering algorithm was applied in order to detect clusters within this high-dimensional data. To identify a subset of distinct, minimally redundant, predictive variables, we used the statistically equivalent signature algorithm. To describe the clusters, we applied measures of central tendency and variability, and we assessed the distinctiveness of the clusters through the emerged variables using the F test for continuous variables and the chi-square test for categorical variables at a confidence level of α=.001.

RESULTS

We identified 3 distinct clusters (including 4128/9188, 44.93% of all data points) among individuals with obesity (n=4128). The most significant continuous variable for distinguishing cluster 1 (n=1458) from clusters 2 and 3 combined (n=2670) was the lower energy intake (mean 1684, SD 393 kcal/day vs mean 2358, SD 635 kcal/day; P<.001). The most significant categorical variable was occupation (P<.001). A significantly higher proportion (1236/1458, 84.77%) in cluster 1 did not work compared to clusters 2 and 3 combined (1486/2670, 55.66%; P<.001). For cluster 2 (n=1521), the most significant continuous variable was a higher energy intake (mean 2755, SD 506.2 kcal/day vs mean 1749, SD 375 kcal/day; P<.001). The most significant categorical variable was sex (P<.001). A significantly higher proportion (997/1521, 65.55%) in cluster 2 were male compared to the other 2 clusters (885/2607, 33.95%; P<.001). For cluster 3 (n=1149), the most significant continuous variable was overall higher cognitive functioning (mean 0.2349, SD 0.5702 vs mean -0.3088, SD 0.7212; P<.001), and educational level was the most significant categorical variable (P<.001). A significantly higher proportion (475/1149, 41.34%) in cluster 3 received higher vocational or university education in comparison to clusters 1 and 2 combined (729/2979, 24.47%; P<.001).

CONCLUSIONS

This study demonstrates that a hypothesis-free and fully data-driven approach can be used to identify distinguishable participant clusters in large and complex datasets and find relevant variables that differ within populations with obesity.

摘要

背景

现代生活方式风险因素,如缺乏身体活动和营养不良,导致肥胖率以及2型糖尿病和心脏病等慢性病发病率不断上升。特别是个性化干预已被证明对长期行为改变有效。机器学习可用于在没有预定义假设的情况下揭示见解,揭示复杂关系和不同的人群聚类。新的数据驱动方法,如因子概率距离聚类算法,为在大型复杂数据集中识别潜在有意义的聚类提供了机会。

目的

本研究旨在使用数据驱动且无假设的机器学习方法,在肥胖个体中识别潜在聚类和相关变量。

方法

我们使用了来自马斯特里赫特研究中腹部肥胖个体的横断面数据。数据(2971个变量)包括人口统计学、生活方式、生物医学方面、高级表型分析和社会因素(2010年队列)。应用因子概率距离聚类算法来检测此高维数据中的聚类。为了识别一组独特的、最小冗余的预测变量,我们使用了统计等效特征算法。为了描述聚类,我们应用了集中趋势和变异性度量,并通过出现的变量,使用连续变量的F检验和分类变量的卡方检验,在α = 0.001的置信水平下评估聚类的独特性。

结果

我们在肥胖个体(n = 4128)中识别出3个不同的聚类(包括4128/9188,占所有数据点的44.93%)。区分聚类1(n = 1458)与聚类2和聚类3合并(n = 2670)的最显著连续变量是较低的能量摄入(平均1684,标准差393千卡/天,对比平均2358,标准差635千卡/天;P < 0.001)。最显著的分类变量是职业(P < 0.001)。与聚类2和聚类3合并(1486/2670,55.66%)相比,聚类1中不工作的比例显著更高(1236/1458,84.77%;P < 0.001)。对于聚类2(n = 1521),最显著的连续变量是较高的能量摄入(平均2755,标准差506.2千卡/天,对比平均1749,标准差375千卡/天;P < 0.001)。最显著的分类变量是性别(P < 0.001)。与其他2个聚类(885/2607,33.95%)相比,聚类2中男性比例显著更高(997/1521,65.55%;P < 0.001)。对于聚类3(n = 1149),最显著的连续变量是总体较高的认知功能(平均0.2349,标准差0.5702,对比平均 - 0.3088,标准差0.7212;P < 0.001),教育水平是最显著的分类变量(P < 0.001)。与聚类1和聚类2合并(729/2979,24.47%)相比,聚类3中接受高等职业或大学教育的比例显著更高(475/1149,41.34%;P < 0.001)。

结论

本研究表明,一种无假设且完全数据驱动的方法可用于在大型复杂数据集中识别可区分的参与者聚类,并找到肥胖人群中存在差异的相关变量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9111/11840370/e0324d78097c/medinform_v13i1e64479_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9111/11840370/fd3c45c10ec8/medinform_v13i1e64479_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9111/11840370/c5a16b6ebe6f/medinform_v13i1e64479_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9111/11840370/3540cd39740d/medinform_v13i1e64479_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9111/11840370/bfdbc9e09067/medinform_v13i1e64479_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9111/11840370/e0324d78097c/medinform_v13i1e64479_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9111/11840370/fd3c45c10ec8/medinform_v13i1e64479_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9111/11840370/c5a16b6ebe6f/medinform_v13i1e64479_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9111/11840370/3540cd39740d/medinform_v13i1e64479_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9111/11840370/bfdbc9e09067/medinform_v13i1e64479_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9111/11840370/e0324d78097c/medinform_v13i1e64479_fig5.jpg

相似文献

1
Identification of Clusters in a Population With Obesity Using Machine Learning: Secondary Analysis of The Maastricht Study.使用机器学习识别肥胖人群中的聚类:马斯特里赫特研究的二次分析
JMIR Med Inform. 2025 Feb 5;13:e64479. doi: 10.2196/64479.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
[Cluster Analysis and Ablation Success Rate in Atrial Fibrillation Patients Undergoing Catheter Ablation].[接受导管消融的心房颤动患者的聚类分析与消融成功率]
Sichuan Da Xue Xue Bao Yi Xue Ban. 2024 May 20;55(3):687-692. doi: 10.12182/20240560101.
4
Why so GLUMM? Detecting depression clusters through graphing lifestyle-environs using machine-learning methods (GLUMM).为何如此忧郁?通过使用机器学习方法绘制生活方式-环境图(GLUMM)来检测抑郁症集群。
Eur Psychiatry. 2017 Jan;39:40-50. doi: 10.1016/j.eurpsy.2016.06.003. Epub 2016 Nov 1.
5
Sheep's coping style can be identified by unsupervised machine learning from unlabeled data.通过对无标签数据进行无监督机器学习,可以识别出绵羊的应对方式。
Behav Processes. 2022 Jan;194:104559. doi: 10.1016/j.beproc.2021.104559. Epub 2021 Nov 25.
6
Machine learning clustering of adult spinal deformity patients identifies four prognostic phenotypes: a multicenter prospective cohort analysis with single surgeon external validation.机器学习对成人脊柱畸形患者进行聚类分析,确定了四种预后表型:一项多中心前瞻性队列分析,具有单一外科医生的外部验证。
Spine J. 2024 Jun;24(6):1095-1108. doi: 10.1016/j.spinee.2024.02.010. Epub 2024 Feb 15.
7
Kinetic Pattern Recognition in Home-Based Knee Rehabilitation Using Machine Learning Clustering Methods on the Slider Digital Physiotherapy Device: Prospective Observational Study.在基于家庭的膝关节康复中,利用滑块数字物理治疗设备上的机器学习聚类方法进行运动模式识别:前瞻性观察研究。
JMIR Form Res. 2025 Mar 18;9:e69150. doi: 10.2196/69150.
8
Machine-learned cluster identification in high-dimensional data.高维数据中的机器学习聚类识别
J Biomed Inform. 2017 Feb;66:95-104. doi: 10.1016/j.jbi.2016.12.011. Epub 2016 Dec 28.
9
Unsupervised Machine Learning of the Combined Danish and Norwegian Knee Ligament Registers: Identification of 5 Distinct Patient Groups With Differing ACL Revision Rates.丹麦和挪威膝关节韧带联合登记处的无监督机器学习:识别出5个具有不同前交叉韧带翻修率的不同患者群体。
Am J Sports Med. 2024 Mar;52(4):881-891. doi: 10.1177/03635465231225215. Epub 2024 Feb 11.
10
Unsupervised learning to identify symptom clusters in older adults undergoing chemotherapy.无监督学习用于识别接受化疗的老年人的症状群。
J Geriatr Oncol. 2025 Apr;16(3):102222. doi: 10.1016/j.jgo.2025.102222. Epub 2025 Mar 14.

本文引用的文献

1
Novel subgroups of obesity and their association with outcomes: a data-driven cluster analysis.肥胖的新型亚组及其与结局的关联:基于数据驱动的聚类分析。
BMC Public Health. 2024 Jan 9;24(1):124. doi: 10.1186/s12889-024-17648-1.
2
Data-driven identification of heart failure disease states and progression pathways using electronic health records.基于电子健康记录的数据驱动方法识别心力衰竭疾病状态和进展途径。
Sci Rep. 2022 Oct 25;12(1):17871. doi: 10.1038/s41598-022-22398-4.
3
Exploring Patient Multimorbidity and Complexity Using Health Insurance Claims Data: A Cluster Analysis Approach.
利用医疗保险理赔数据探索患者的多种疾病及复杂性:一种聚类分析方法。
JMIR Med Inform. 2022 Apr 4;10(4):e34274. doi: 10.2196/34274.
4
The ENCOMPASS framework: a practical guide for the evaluation of public health programmes in complex adaptive systems.ENCOMPASS 框架:复杂适应系统中公共卫生计划评估的实用指南。
Int J Behav Nutr Phys Act. 2022 Mar 28;19(1):33. doi: 10.1186/s12966-022-01267-3.
5
Multiscale classification of heart failure phenotypes by unsupervised clustering of unstructured electronic medical record data.基于非结构化电子病历数据的无监督聚类对心力衰竭表型进行多尺度分类。
Sci Rep. 2020 Dec 7;10(1):21340. doi: 10.1038/s41598-020-77286-6.
6
Metabolic profiling of tissue-specific insulin resistance in human obesity: results from the Diogenes study and the Maastricht Study.人类肥胖中组织特异性胰岛素抵抗的代谢特征分析:来自 Diogenes 研究和马斯特里赫特研究的结果。
Int J Obes (Lond). 2020 Jun;44(6):1376-1386. doi: 10.1038/s41366-020-0565-z. Epub 2020 Mar 17.
7
Waist circumference as a vital sign in clinical practice: a Consensus Statement from the IAS and ICCR Working Group on Visceral Obesity.腰围作为临床实践中的生命体征:IAS 和 ICCR 内脏肥胖工作组的共识声明。
Nat Rev Endocrinol. 2020 Mar;16(3):177-189. doi: 10.1038/s41574-019-0310-7. Epub 2020 Feb 4.
8
Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study.基于随机森林的插补方法在 LC-MS 代谢组学数据插补方面优于其他方法:一项比较研究。
BMC Bioinformatics. 2019 Oct 11;20(1):492. doi: 10.1186/s12859-019-3110-0.
9
Artificial Intelligence and Big Data in Public Health.人工智能和大数据在公共卫生中的应用。
Int J Environ Res Public Health. 2018 Dec 10;15(12):2796. doi: 10.3390/ijerph15122796.
10
Learning for Personalized Medicine: A Comprehensive Review From a Deep Learning Perspective.个性化医学学习:从深度学习视角的全面综述。
IEEE Rev Biomed Eng. 2019;12:194-208. doi: 10.1109/RBME.2018.2864254. Epub 2018 Aug 7.