• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

聚类 BMA:用于聚类的贝叶斯模型平均。

clusterBMA: Bayesian model averaging for clustering.

机构信息

Centre for Data Science, School of Mathematical Sciences, Queensland University of Technology, Brisbane, QLD, Australia.

School of Information Science and Engineering, Yunnan University, Kunming, China.

出版信息

PLoS One. 2023 Aug 21;18(8):e0288000. doi: 10.1371/journal.pone.0288000. eCollection 2023.

DOI:10.1371/journal.pone.0288000
PMID:37603575
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10441802/
Abstract

Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble clustering literature. The approach of reporting results from one 'best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and parameters chosen. Bayesian model averaging (BMA) is a popular approach for combining results across multiple models that offers some attractive benefits in this setting, including probabilistic interpretation of the combined cluster structure and quantification of model-based uncertainty. In this work we introduce clusterBMA, a method that enables weighted model averaging across results from multiple unsupervised clustering algorithms. We use clustering internal validation criteria to develop an approximation of the posterior model probability, used for weighting the results from each model. From a combined posterior similarity matrix representing a weighted average of the clustering solutions across models, we apply symmetric simplex matrix factorisation to calculate final probabilistic cluster allocations. In addition to outperforming other ensemble clustering methods on simulated data, clusterBMA offers unique features including probabilistic allocation to averaged clusters, combining allocation probabilities from 'hard' and 'soft' clustering algorithms, and measuring model-based uncertainty in averaged cluster allocation. This method is implemented in an accompanying R package of the same name. We use simulated datasets to explore the ability of the proposed technique to identify robust integrated clusters with varying levels of separation between subgroups, and with varying numbers of clusters between models. Benchmarking accuracy against four other ensemble methods previously demonstrated to be highly effective in the literature, clusterBMA matches or exceeds the performance of competing approaches under various conditions of dimensionality and cluster separation. clusterBMA substantially outperformed other ensemble methods for high dimensional simulated data with low cluster separation, with 1.16 to 7.12 times better performance as measured by the Adjusted Rand Index. We also explore the performance of this approach through a case study that aims to identify probabilistic clusters of individuals based on electroencephalography (EEG) data. In applied settings for clustering individuals based on health data, the features of probabilistic allocation and measurement of model-based uncertainty in averaged clusters are useful for clinical relevance and statistical communication.

摘要

多种方法已被开发用于在无监督聚类的集成聚类文献中,对多个结果集进行推断。从多个候选聚类模型中报告一个“最佳”模型的结果的方法通常忽略了来自模型选择的不确定性,并且导致的推断对选择的特定模型和参数敏感。贝叶斯模型平均(BMA)是一种在这种情况下组合多个模型结果的流行方法,具有一些有吸引力的好处,包括对组合聚类结构的概率解释和量化基于模型的不确定性。在这项工作中,我们引入了 clusterBMA,这是一种能够在来自多个无监督聚类算法的结果之间进行加权模型平均的方法。我们使用聚类内部验证标准来开发近似的后验模型概率,用于加权每个模型的结果。从代表模型跨模型聚类解决方案加权平均值的组合后验相似性矩阵,我们应用对称单形矩阵分解来计算最终的概率聚类分配。除了在模拟数据上优于其他集成聚类方法外,clusterBMA 还具有独特的功能,包括对平均聚类的概率分配、组合“硬”和“软”聚类算法的分配概率,以及测量平均聚类分配中的基于模型的不确定性。该方法在同名的配套 R 包中实现。我们使用模拟数据集来探索该技术识别具有不同亚组之间分离程度和模型之间不同聚类数量的稳健综合聚类的能力。与文献中先前证明非常有效的四种其他集成方法进行基准测试准确性,在各种维度和聚类分离条件下,clusterBMA 与竞争方法的性能相匹配或超过。在具有低聚类分离的高维模拟数据中,clusterBMA 比其他集成方法的性能提高了 1.16 到 7.12 倍,这是通过调整兰德指数测量的。我们还通过旨在根据脑电图(EEG)数据识别个体概率聚类的案例研究来探索这种方法的性能。在基于健康数据对个体进行聚类的应用设置中,平均聚类中的概率分配和基于模型的不确定性的测量特征对于临床相关性和统计交流很有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e221/10441802/08a0952b5d8f/pone.0288000.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e221/10441802/878a998764bb/pone.0288000.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e221/10441802/80437ef396ee/pone.0288000.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e221/10441802/8cfa6a4cd2bf/pone.0288000.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e221/10441802/08a0952b5d8f/pone.0288000.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e221/10441802/878a998764bb/pone.0288000.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e221/10441802/80437ef396ee/pone.0288000.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e221/10441802/8cfa6a4cd2bf/pone.0288000.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e221/10441802/08a0952b5d8f/pone.0288000.g004.jpg

相似文献

1
clusterBMA: Bayesian model averaging for clustering.聚类 BMA:用于聚类的贝叶斯模型平均。
PLoS One. 2023 Aug 21;18(8):e0288000. doi: 10.1371/journal.pone.0288000. eCollection 2023.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Robust clustering-based hybrid technique enabling reliable reservoir water quality prediction with uncertainty quantification and spatial analysis.基于鲁棒聚类的混合技术,可实现具有不确定性量化和空间分析的可靠水库水质预测。
J Environ Manage. 2024 Jun;362:121259. doi: 10.1016/j.jenvman.2024.121259. Epub 2024 Jun 3.
4
Consensus clustering for Bayesian mixture models.贝叶斯混合模型的一致性聚类。
BMC Bioinformatics. 2022 Jul 21;23(1):290. doi: 10.1186/s12859-022-04830-8.
5
A novel approach for clustering proteomics data using Bayesian fast Fourier transform.一种使用贝叶斯快速傅里叶变换对蛋白质组学数据进行聚类的新方法。
Bioinformatics. 2005 May 15;21(10):2210-24. doi: 10.1093/bioinformatics/bti383. Epub 2005 Mar 15.
6
Group-representative functional network estimation from multi-subject fMRI data via MRF-based image segmentation.基于马尔可夫随机场图像分割的多体素 fMRI 数据的群组代表性功能网络估计。
Comput Methods Programs Biomed. 2019 Oct;179:104976. doi: 10.1016/j.cmpb.2019.07.004. Epub 2019 Jul 19.
7
Dissecting trait heterogeneity: a comparison of three clustering methods applied to genotypic data.剖析性状异质性:应用于基因型数据的三种聚类方法的比较
BMC Bioinformatics. 2006 Apr 12;7:204. doi: 10.1186/1471-2105-7-204.
8
Bayesian infinite mixture model based clustering of gene expression profiles.基于贝叶斯无限混合模型的基因表达谱聚类
Bioinformatics. 2002 Sep;18(9):1194-206. doi: 10.1093/bioinformatics/18.9.1194.
9
Bayesian model averaging for evaluation of candidate gene effects.用于评估候选基因效应的贝叶斯模型平均法。
Genetica. 2010 Mar;138(3):395-407. doi: 10.1007/s10709-009-9433-4. Epub 2010 Jan 5.
10
Model Uncertainty and Bayesian Model Averaged Benchmark Dose Estimation for Continuous Data.连续数据的模型不确定性与贝叶斯模型平均基准剂量估计
Risk Anal. 2014 Jan;34(1):101-20. doi: 10.1111/risa.12078. Epub 2013 Jun 11.

引用本文的文献

1
Analysis of Acute and Short-Term Fluoride Toxicity in Zebrafish Embryo and Sac-Fry Stages Based on Bayesian Model Averaging.基于贝叶斯模型平均法对斑马鱼胚胎和囊胚期急性及短期氟毒性的分析
Toxics. 2024 Dec 11;12(12):902. doi: 10.3390/toxics12120902.
2
Generalization of generative model for neuronal ensemble inference method.生成模型在神经元集合推断方法中的推广。
PLoS One. 2023 Jun 27;18(6):e0287708. doi: 10.1371/journal.pone.0287708. eCollection 2023.

本文引用的文献

1
A framework for evaluating the performance of SMLM cluster analysis algorithms.用于评估 SMLM 聚类分析算法性能的框架。
Nat Methods. 2023 Feb;20(2):259-267. doi: 10.1038/s41592-022-01750-6. Epub 2023 Feb 10.
2
EEG-based clusters differentiate psychological distress, sleep quality and cognitive function in adolescents.基于脑电图的聚类可区分青少年的心理困扰、睡眠质量和认知功能。
Biol Psychol. 2022 Sep;173:108403. doi: 10.1016/j.biopsycho.2022.108403. Epub 2022 Jul 28.
3
Bayesian Distance Clustering.贝叶斯距离聚类
J Mach Learn Res. 2021 Jan-Dec;22.
4
Bayesian consensus clustering for multivariate longitudinal data.贝叶斯共识聚类分析多元纵向数据。
Stat Med. 2022 Jan 15;41(1):108-127. doi: 10.1002/sim.9225. Epub 2021 Oct 20.
5
Parameter clustering in Bayesian functional principal component analysis of neuroscientific data.神经科学数据的贝叶斯功能主成分分析中的参数聚类
Stat Med. 2021 Jan 15;40(1):167-184. doi: 10.1002/sim.8768. Epub 2020 Oct 11.
6
Investigating the association between sleep quality and diffusion-derived structural integrity of white matter in early adolescence.探究青少年早期睡眠质量与弥散张量成像所得白质结构完整性的相关性。
J Adolesc. 2020 Aug;83:12-21. doi: 10.1016/j.adolescence.2020.06.008. Epub 2020 Jul 2.
7
Using measures of intrinsic homeostasis and extrinsic modulation to evaluate mental health in adolescents: Preliminary results from the longitudinal adolescent brain study (LABS).利用内在稳态和外在调节指标评估青少年心理健康:青少年大脑纵向研究(LABS)的初步结果
Psychiatry Res. 2020 Feb 4;285:112848. doi: 10.1016/j.psychres.2020.112848.
8
diceR: an R package for class discovery using an ensemble driven approach.diceR:一个使用集成驱动方法进行分类发现的 R 包。
BMC Bioinformatics. 2018 Jan 15;19(1):11. doi: 10.1186/s12859-017-1996-y.
9
Modeling clustering and treatment effect heterogeneity in parallel and stepped-wedge cluster randomized trials.在平行和阶梯式楔形群组随机试验中建模聚类和处理效果异质性。
Stat Med. 2018 Mar 15;37(6):883-898. doi: 10.1002/sim.7553. Epub 2018 Jan 8.
10
mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models.mclust 5:使用高斯有限混合模型进行聚类、分类和密度估计
R J. 2016 Aug;8(1):289-317.