• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

如何在以人为本的研究中高效运用基于模型的聚类分析

How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research.

作者信息

Gergely Bence, Vargha András

机构信息

Károli Gáspár University, Budapest, Hungary.

University of Amsterdam, Amsterdam, The Netherlands.

出版信息

J Pers Oriented Res. 2021 Aug 26;7(1):22-35. doi: 10.17505/jpor.2021.23449. eCollection 2021.

DOI:10.17505/jpor.2021.23449
PMID:34548917
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8411881/
Abstract

Model-based cluster analysis (MBCA) was created to automatize the often subjective model-selection procedure of traditional explorative clustering methods. It is a type of finite mixture modelling, assuming that the data come from a mixture of different subpopulations following given distributions, typically multivariate normal. In that case cluster analysis is the exploration of the underlying mixture structure. In MBCA finding the possible number of clusters and the best clustering model is a statistical model-selection problem, where the models with differing number and type of component distributions are compared. For fitting a certain model MBCA uses a likelihood based Bayesian Information Criterion (BIC) to evaluate its appropriateness and the model with the highest BIC value is accepted as the final solution. The aim of the present study is to investigate the adequacy of automatic model selection in MBCA using BIC, and suggested alternative methods, like the Integrated Completed Likelihood Criterion (ICL), or Baudry's method. An additional aim is to refine these procedures by using so called quality coefficients (QCs), borrowed from methodological advances within the field of exploratory cluster analysis, to help in the choice of an appropriate cluster structure (CLS), and also to compare the efficiency of MBCA in identifying a theoretical CLS with those of various other clustering methods. The analyses are restricted to studying the performance of various procedures of the type described above for two classification situations, typical in person-oriented studies: (1) an example data set characterized by a perfect theoretical CLS with seven types (seven completely homogeneous clusters) was used to generate three data sets with varying degrees of measurement error added to the original values, and (2) three additional data sets based on another perfect theoretical CLS with four types. It was found that the automatic decision rarely led to an optimal solution. However, dropping solutions with irregular BIC curves, and using different QCs as an aid in choosing between different solutions generated by MBCA and by fusing close clusters, optimal solutions were achieved for the two classification situations studied. With this refined procedure the revealed cluster solutions of MBCA often proved to be at least as good as those of different hierarchical and -center clustering methods. MBCA was definitely superior in identifying four-type CLS models. In identifying seven-type CLS models MBCA performed at a similar level as the best of other clustering methods (such as -means) only when the reliability level of the input variables was high or moderate, otherwise it was slightly less efficient.

摘要

基于模型的聚类分析(MBCA)旨在实现传统探索性聚类方法中通常主观的模型选择过程自动化。它是一种有限混合建模类型,假设数据来自遵循给定分布(通常是多元正态分布)的不同子总体的混合。在这种情况下,聚类分析就是对潜在混合结构的探索。在MBCA中,确定可能的聚类数量和最佳聚类模型是一个统计模型选择问题,需要比较具有不同数量和类型成分分布的模型。为了拟合某个模型,MBCA使用基于似然的贝叶斯信息准则(BIC)来评估其适用性,具有最高BIC值的模型被接受为最终解决方案。本研究的目的是调查使用BIC的MBCA中自动模型选择的充分性,并提出替代方法,如积分完备似然准则(ICL)或鲍德里方法。另一个目的是通过使用从探索性聚类分析领域的方法进展中借鉴的所谓质量系数(QC)来完善这些程序,以帮助选择合适的聚类结构(CLS),并比较MBCA在识别理论CLS方面与其他各种聚类方法的效率。分析仅限于研究上述类型的各种程序在两种分类情况下的性能,这两种情况在以人为本的研究中很典型:(1)使用一个以具有七种类型(七个完全同质聚类)的完美理论CLS为特征的示例数据集来生成三个数据集,在原始值上添加了不同程度的测量误差;(2)基于另一个具有四种类型的完美理论CLS的另外三个数据集。研究发现,自动决策很少能得出最优解。然而,舍弃具有不规则BIC曲线的解,并使用不同的QC作为辅助在MBCA生成的不同解之间以及通过合并紧密聚类进行选择,对于所研究的两种分类情况都实现了最优解。通过这种完善的程序,MBCA揭示的聚类解通常被证明至少与不同的层次聚类和中心聚类方法的解一样好。在识别四种类型的CLS模型方面,MBCA绝对更具优势。在识别七种类型的CLS模型时,只有当输入变量的可靠性水平高或中等时,MBCA的表现才与其他最佳聚类方法(如均值法)处于相似水平,否则效率略低。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/faa7801ef469/JPOR-7-1-23449-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/2631050ccb79/JPOR-7-1-23449-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/2d44e9f7cbb3/JPOR-7-1-23449-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/8be80ed4bbe5/JPOR-7-1-23449-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/6f8f4bf44b18/JPOR-7-1-23449-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/36e35573f587/JPOR-7-1-23449-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/6c9e30b1bd54/JPOR-7-1-23449-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/1a57569eeba6/JPOR-7-1-23449-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/c3874e1eaf18/JPOR-7-1-23449-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/faa7801ef469/JPOR-7-1-23449-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/2631050ccb79/JPOR-7-1-23449-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/2d44e9f7cbb3/JPOR-7-1-23449-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/8be80ed4bbe5/JPOR-7-1-23449-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/6f8f4bf44b18/JPOR-7-1-23449-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/36e35573f587/JPOR-7-1-23449-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/6c9e30b1bd54/JPOR-7-1-23449-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/1a57569eeba6/JPOR-7-1-23449-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/c3874e1eaf18/JPOR-7-1-23449-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9eb/8411881/faa7801ef469/JPOR-7-1-23449-g009.jpg

相似文献

1
How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research.如何在以人为本的研究中高效运用基于模型的聚类分析
J Pers Oriented Res. 2021 Aug 26;7(1):22-35. doi: 10.17505/jpor.2021.23449. eCollection 2021.
2
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
3
A joint finite mixture model for clustering genes from independent Gaussian and beta distributed data.一种用于对来自独立高斯分布和贝塔分布数据的基因进行聚类的联合有限混合模型。
BMC Bioinformatics. 2009 May 29;10:165. doi: 10.1186/1471-2105-10-165.
4
Subtyping of children with developmental dyslexia via bootstrap aggregated clustering and the gap statistic: comparison with the double-deficit hypothesis.通过自助聚合聚类和间隙统计对发育性阅读障碍儿童进行亚型分类:与双重缺陷假说的比较
Int J Lang Commun Disord. 2007 Jan-Feb;42(1):77-95. doi: 10.1080/13682820600806680.
5
Combining Mixture Components for Clustering.组合混合成分用于聚类。
J Comput Graph Stat. 2010 Jun 1;9(2):332-353. doi: 10.1198/jcgs.2010.08111.
6
Exploring Types of Parent Attachment via the Clustering Modules of a New Free Statistical Software, ROP-R.通过新型免费统计软件ROP-R的聚类模块探索亲子依恋类型
J Pers Oriented Res. 2024 May 23;10(1):1-15. doi: 10.17505/jpor.2024.26255. eCollection 2024.
7
8
Revitalizing the typological approach: Some methods for finding types.振兴类型学方法:一些寻找类型的方法。
J Pers Oriented Res. 2017 Nov 1;3(1):49-62. doi: 10.17505/jpor.2017.04. eCollection 2017.
9
Assessing variation in life-history tactics within a population using mixture regression models: a practical guide for evolutionary ecologists.利用混合回归模型评估种群内生活史策略的变化:进化生态学家的实用指南。
Biol Rev Camb Philos Soc. 2017 May;92(2):754-775. doi: 10.1111/brv.12254. Epub 2016 Mar 1.
10
CHull as an alternative to AIC and BIC in the context of mixtures of factor analyzers.在因子分析混合模型中,CHull 可以替代 AIC 和 BIC。
Behav Res Methods. 2013 Sep;45(3):782-91. doi: 10.3758/s13428-012-0293-y.

引用本文的文献

1
Exploring Types of Parent Attachment via the Clustering Modules of a New Free Statistical Software, ROP-R.通过新型免费统计软件ROP-R的聚类模块探索亲子依恋类型
J Pers Oriented Res. 2024 May 23;10(1):1-15. doi: 10.17505/jpor.2024.26255. eCollection 2024.

本文引用的文献

1
Revitalizing the typological approach: Some methods for finding types.振兴类型学方法:一些寻找类型的方法。
J Pers Oriented Res. 2017 Nov 1;3(1):49-62. doi: 10.17505/jpor.2017.04. eCollection 2017.
2
mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models.mclust 5:使用高斯有限混合模型进行聚类、分类和密度估计
R J. 2016 Aug;8(1):289-317.
3
Combining Mixture Components for Clustering.组合混合成分用于聚类。
J Comput Graph Stat. 2010 Jun 1;9(2):332-353. doi: 10.1198/jcgs.2010.08111.