• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多模态临床数据聚类的挑战:哮喘亚型分类中的应用综述

Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping.

作者信息

Horne Elsie, Tibble Holly, Sheikh Aziz, Tsanas Athanasios

机构信息

Usher Institute, Edinburgh Medical School, University of Edinburgh, Edinburgh, United Kingdom.

出版信息

JMIR Med Inform. 2020 May 28;8(5):e16452. doi: 10.2196/16452.

DOI:10.2196/16452
PMID:32463370
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7290450/
Abstract

BACKGROUND

In the current era of personalized medicine, there is increasing interest in understanding the heterogeneity in disease populations. Cluster analysis is a method commonly used to identify subtypes in heterogeneous disease populations. The clinical data used in such applications are typically multimodal, which can make the application of traditional cluster analysis methods challenging.

OBJECTIVE

This study aimed to review the research literature on the application of clustering multimodal clinical data to identify asthma subtypes. We assessed common problems and shortcomings in the application of cluster analysis methods in determining asthma subtypes, such that they can be brought to the attention of the research community and avoided in future studies.

METHODS

We searched PubMed and Scopus bibliographic databases with terms related to cluster analysis and asthma to identify studies that applied dissimilarity-based cluster analysis methods. We recorded the analytic methods used in each study at each step of the cluster analysis process.

RESULTS

Our literature search identified 63 studies that applied cluster analysis to multimodal clinical data to identify asthma subtypes. The features fed into the cluster algorithms were of a mixed type in 47 (75%) studies and continuous in 12 (19%), and the feature type was unclear in the remaining 4 (6%) studies. A total of 23 (37%) studies used hierarchical clustering with Ward linkage, and 22 (35%) studies used k-means clustering. Of these 45 studies, 39 had mixed-type features, but only 5 specified dissimilarity measures that could handle mixed-type features. A further 9 (14%) studies used a preclustering step to create small clusters to feed on a hierarchical method. The original sample sizes in these 9 studies ranged from 84 to 349. The remaining studies used hierarchical clustering with other linkages (n=3), medoid-based methods (n=3), spectral clustering (n=1), and multiple kernel k-means clustering (n=1), and in 1 study, the methods were unclear. Of 63 studies, 54 (86%) explained the methods used to determine the number of clusters, 24 (38%) studies tested the quality of their cluster solution, and 11 (17%) studies tested the stability of their solution. Reporting of the cluster analysis was generally poor in terms of the methods employed and their justification.

CONCLUSIONS

This review highlights common issues in the application of cluster analysis to multimodal clinical data to identify asthma subtypes. Some of these issues were related to the multimodal nature of the data, but many were more general issues in the application of cluster analysis. Although cluster analysis may be a useful tool for investigating disease subtypes, we recommend that future studies carefully consider the implications of clustering multimodal data, the cluster analysis process itself, and the reporting of methods to facilitate replication and interpretation of findings.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3447/7290450/1dd3ead1f0b2/medinform_v8i5e16452_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3447/7290450/19cdaaa4812e/medinform_v8i5e16452_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3447/7290450/3642eb0c0824/medinform_v8i5e16452_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3447/7290450/1dd3ead1f0b2/medinform_v8i5e16452_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3447/7290450/19cdaaa4812e/medinform_v8i5e16452_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3447/7290450/3642eb0c0824/medinform_v8i5e16452_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3447/7290450/1dd3ead1f0b2/medinform_v8i5e16452_fig3.jpg
摘要

背景

在当前个性化医疗时代,人们越来越关注疾病群体中的异质性。聚类分析是一种常用于识别异质性疾病群体中不同亚型的方法。此类应用中使用的临床数据通常是多模态的,这可能会使传统聚类分析方法的应用具有挑战性。

目的

本研究旨在回顾关于应用聚类多模态临床数据来识别哮喘亚型的研究文献。我们评估了聚类分析方法在确定哮喘亚型应用中的常见问题和不足,以便引起研究界的关注并在未来研究中避免。

方法

我们在PubMed和Scopus文献数据库中搜索与聚类分析和哮喘相关的术语,以识别应用基于差异的聚类分析方法的研究。我们记录了聚类分析过程每个步骤中每项研究使用的分析方法。

结果

我们的文献检索确定了63项将聚类分析应用于多模态临床数据以识别哮喘亚型的研究。输入聚类算法的特征在47项(75%)研究中为混合类型,在12项(19%)研究中为连续类型,其余4项(6%)研究的特征类型不明确。共有23项(37%)研究使用了Ward链接的层次聚类,22项(35%)研究使用了k均值聚类。在这45项研究中,39项具有混合类型特征,但只有5项指定了可处理混合类型特征的差异度量。另外9项(14%)研究使用了预聚类步骤来创建小聚类以供层次方法使用。这9项研究中的原始样本量从84到349不等。其余研究使用了其他链接的层次聚类(n = 3)、基于中心点的方法(n = 3)、谱聚类(n = 1)和多核k均值聚类(n = 1),在1项研究中,方法不明确。在63项研究中,54项(86%)解释了用于确定聚类数量的方法,24项(38%)研究测试了其聚类解决方案的质量,11项(17%)研究测试了其解决方案的稳定性。就所采用的方法及其合理性而言,聚类分析的报告总体较差。

结论

本综述强调了在应用聚类分析于多模态临床数据以识别哮喘亚型方面的常见问题。其中一些问题与数据的多模态性质有关,但许多是聚类分析应用中更普遍的问题。尽管聚类分析可能是研究疾病亚型的有用工具,但我们建议未来的研究仔细考虑聚类多模态数据的影响、聚类分析过程本身以及方法的报告,以促进研究结果的复制和解释。

相似文献

1
Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping.多模态临床数据聚类的挑战:哮喘亚型分类中的应用综述
JMIR Med Inform. 2020 May 28;8(5):e16452. doi: 10.2196/16452.
2
Identifying and evaluating clinical subtypes of Alzheimer's disease in care electronic health records using unsupervised machine learning.利用无监督机器学习在护理电子健康记录中识别和评估阿尔茨海默病的临床亚型。
BMC Med Inform Decis Mak. 2021 Dec 8;21(1):343. doi: 10.1186/s12911-021-01693-6.
3
Subtyping of children with developmental dyslexia via bootstrap aggregated clustering and the gap statistic: comparison with the double-deficit hypothesis.通过自助聚合聚类和间隙统计对发育性阅读障碍儿童进行亚型分类:与双重缺陷假说的比较
Int J Lang Commun Disord. 2007 Jan-Feb;42(1):77-95. doi: 10.1080/13682820600806680.
4
Sheep's coping style can be identified by unsupervised machine learning from unlabeled data.通过对无标签数据进行无监督机器学习,可以识别出绵羊的应对方式。
Behav Processes. 2022 Jan;194:104559. doi: 10.1016/j.beproc.2021.104559. Epub 2021 Nov 25.
5
Machine-learned cluster identification in high-dimensional data.高维数据中的机器学习聚类识别
J Biomed Inform. 2017 Feb;66:95-104. doi: 10.1016/j.jbi.2016.12.011. Epub 2016 Dec 28.
6
Feature selection for unsupervised machine learning of accelerometer data physical activity clusters - A systematic review.用于加速度计数据身体活动簇无监督机器学习的特征选择——一项系统综述
Gait Posture. 2021 Oct;90:120-128. doi: 10.1016/j.gaitpost.2021.08.007. Epub 2021 Aug 13.
7
Pathway-based deep clustering for molecular subtyping of cancer.基于通路的深度聚类在癌症分子分型中的应用。
Methods. 2020 Feb 15;173:24-31. doi: 10.1016/j.ymeth.2019.06.017. Epub 2019 Jun 25.
8
Cancer subtyping with heterogeneous multi-omics data via hierarchical multi-kernel learning.通过分层多核学习对具有异质多组学数据的癌症进行亚型分类。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac488.
9
Features of asthma which provide meaningful insights for understanding the disease heterogeneity.哮喘的特征为理解疾病异质性提供了有意义的见解。
Clin Exp Allergy. 2018 Jan;48(1):39-47. doi: 10.1111/cea.13014. Epub 2017 Sep 15.
10
Asthma clustering methods: a literature-informed application to the children's health study data.哮喘聚类方法:基于文献的儿童健康研究数据应用
J Asthma. 2022 Jul;59(7):1305-1318. doi: 10.1080/02770903.2021.1923738. Epub 2021 May 18.

引用本文的文献

1
Protocol for development of a checklist and guideline for transparent reporting of cluster analyses (TRoCA).制定聚类分析透明报告清单及指南(TRoCA)的方案
BMJ Open. 2025 Aug 21;15(8):e099609. doi: 10.1136/bmjopen-2025-099609.
2
Clustering of electronic health records in atrial fibrillation patients and impact on prognosis and patient trajectories: a UK linked-dataset study.心房颤动患者电子健康记录的聚类分析及其对预后和患者轨迹的影响:一项英国关联数据集研究
Eur Heart J Digit Health. 2025 Apr 5;6(4):797-810. doi: 10.1093/ehjdh/ztaf032. eCollection 2025 Jul.
3
Evaluating the kidney disease progression using a comprehensive patient profiling algorithm: A hybrid clustering approach.

本文引用的文献

1
Grey matter abnormalities are associated only with severe cognitive decline in early stages of Parkinson's disease.灰质异常仅与帕金森病早期的严重认知能力下降有关。
Cortex. 2020 Feb;123:1-11. doi: 10.1016/j.cortex.2019.09.015. Epub 2019 Oct 24.
2
Identifying subtypes of Hypersomnolence Disorder: a clustering analysis.识别嗜睡障碍亚型:聚类分析。
Sleep Med. 2019 Dec;64:71-76. doi: 10.1016/j.sleep.2019.06.015. Epub 2019 Jul 4.
3
Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records.
使用综合患者特征分析算法评估肾脏疾病进展:一种混合聚类方法。
PLoS One. 2025 Jul 11;20(7):e0310749. doi: 10.1371/journal.pone.0310749. eCollection 2025.
4
A modified and weighted Gower distance-based clustering analysis for mixed type data: a simulation and empirical analyses.一种基于修正加权Gower距离的混合型数据聚类分析:模拟与实证分析
BMC Med Res Methodol. 2024 Dec 18;24(1):305. doi: 10.1186/s12874-024-02427-8.
5
ClustAll: An R package for patient stratification in complex diseases.ClustAll:一个用于复杂疾病患者分层的R软件包。
PLoS Comput Biol. 2024 Dec 13;20(12):e1012656. doi: 10.1371/journal.pcbi.1012656. eCollection 2024 Dec.
6
A robust clustering strategy for stratification unveils unique patient subgroups in acutely decompensated cirrhosis.一项稳健的聚类分层策略揭示了急性失代偿性肝硬化中独特的患者亚组。
J Transl Med. 2024 Jun 27;22(1):599. doi: 10.1186/s12967-024-05386-2.
7
Finding Similarities in Differences Between Autistic Adults: Two Replicated Subgroups.发现自闭症成人之间差异的相似性:两个复制的亚组。
J Autism Dev Disord. 2024 Sep;54(9):3449-3466. doi: 10.1007/s10803-023-06042-2. Epub 2023 Jul 12.
8
Capability, Opportunity, and Motivation Model for Behavior Change in People With Asthma: Protocol for a Cross-Sectional Study.哮喘患者行为改变的能力、机会和动机模型:一项横断面研究方案
JMIR Res Protoc. 2023 Jul 6;12:e44710. doi: 10.2196/44710.
9
Evaluation of data processing pipelines on real-world electronic health records data for the purpose of measuring patient similarity.基于真实世界电子健康记录数据评估数据处理管道,以衡量患者相似度。
PLoS One. 2023 Jun 15;18(6):e0287264. doi: 10.1371/journal.pone.0287264. eCollection 2023.
10
Influence of User Profile Attributes on e-Cigarette-Related Searches on YouTube: Machine Learning Clustering and Classification.用户资料属性对YouTube上与电子烟相关搜索的影响:机器学习聚类与分类
JMIR Infodemiology. 2023 Apr 12;3:e42218. doi: 10.2196/42218. eCollection 2023.
利用初级保健人群基于电子健康记录的数据分析方法识别有临床意义的 COPD 亚型。
BMC Med Inform Decis Mak. 2019 Apr 18;19(1):86. doi: 10.1186/s12911-019-0805-0.
4
Characterization of cancer genomic heterogeneity by next-generation sequencing advances precision medicine in cancer treatment.通过下一代测序对癌症基因组异质性进行表征,推动了癌症治疗中的精准医学发展。
Precis Clin Med. 2018 Jun;1(1):29-48. doi: 10.1093/pcmedi/pby007. Epub 2018 Jun 14.
5
Multiview Cluster Analysis Identifies Variable Corticosteroid Response Phenotypes in Severe Asthma.多视图聚类分析鉴定重症哮喘中皮质类固醇反应的可变表型。
Am J Respir Crit Care Med. 2019 Jun 1;199(11):1358-1367. doi: 10.1164/rccm.201808-1543OC.
6
Future Roadmaps for Precision Medicine Applied to Diabetes: Rising to the Challenge of Heterogeneity.精准医学应用于糖尿病的未来蓝图:应对异质性的挑战。
J Diabetes Res. 2018 Nov 27;2018:3061620. doi: 10.1155/2018/3061620. eCollection 2018.
7
Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017.全球、区域和国家层面 195 个国家和地区 1990 年至 2017 年 354 种疾病和伤害导致的发病率、患病率和伤残损失寿命年:基于 2017 年全球疾病负担研究的系统分析。
Lancet. 2018 Nov 10;392(10159):1789-1858. doi: 10.1016/S0140-6736(18)32279-7. Epub 2018 Nov 8.
8
Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980-2017: a systematic analysis for the Global Burden of Disease Study 2017.全球、区域和国家按年龄、性别和死因分类的死亡率,195 个国家和地区,1980-2017 年:2017 年全球疾病负担研究的系统分析。
Lancet. 2018 Nov 10;392(10159):1736-1788. doi: 10.1016/S0140-6736(18)32203-7. Epub 2018 Nov 8.
9
Developing and validating Parkinson's disease subtypes and their motor and cognitive progression.开发和验证帕金森病亚型及其运动和认知进展。
J Neurol Neurosurg Psychiatry. 2018 Dec;89(12):1279-1287. doi: 10.1136/jnnp-2018-318337. Epub 2018 Jul 25.
10
Potential identification of vitamin B6 responsiveness in autism spectrum disorder utilizing phenotype variables and machine learning methods.利用表型变量和机器学习方法鉴定自闭症谱系障碍对维生素 B6 的反应性。
Sci Rep. 2018 Oct 4;8(1):14840. doi: 10.1038/s41598-018-33110-w.