• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于高斯混合模型聚类的变量选择

Variable selection for clustering with Gaussian mixture models.

作者信息

Maugis Cathy, Celeux Gilles, Martin-Magniette Marie-Laure

机构信息

Department of Mathematics, University Paris-Sud 11, Orsay, France.

出版信息

Biometrics. 2009 Sep;65(3):701-9. doi: 10.1111/j.1541-0420.2008.01160.x. Epub 2009 Feb 4.

DOI:10.1111/j.1541-0420.2008.01160.x
PMID:19210744
Abstract

This article is concerned with variable selection for cluster analysis. The problem is regarded as a model selection problem in the model-based cluster analysis context. A model generalizing the model of Raftery and Dean (2006, Journal of the American Statistical Association 101, 168-178) is proposed to specify the role of each variable. This model does not need any prior assumptions about the linear link between the selected and discarded variables. Models are compared with Bayesian information criterion. Variable role is obtained through an algorithm embedding two backward stepwise algorithms for variable selection for clustering and linear regression. The model identifiability is established and the consistency of the resulting criterion is proved under regularity conditions. Numerical experiments on simulated datasets and a genomic application highlight the interest of the procedure.

摘要

本文关注聚类分析中的变量选择。在基于模型的聚类分析背景下,该问题被视为一个模型选择问题。提出了一个推广Raftery和Dean(2006年,《美国统计协会杂志》101卷,168 - 178页)模型的模型,以明确每个变量的作用。该模型不需要对所选变量和舍弃变量之间的线性联系做任何先验假设。通过贝叶斯信息准则对模型进行比较。变量作用是通过一种算法获得的,该算法嵌入了用于聚类和线性回归变量选择的两种向后逐步算法。建立了模型可识别性,并在正则条件下证明了所得准则的一致性。在模拟数据集上的数值实验和一个基因组应用突出了该方法的价值。

相似文献

1
Variable selection for clustering with Gaussian mixture models.用于高斯混合模型聚类的变量选择
Biometrics. 2009 Sep;65(3):701-9. doi: 10.1111/j.1541-0420.2008.01160.x. Epub 2009 Feb 4.
2
Bayesian variable selection method for censored survival data.用于删失生存数据的贝叶斯变量选择方法。
Biometrics. 1998 Dec;54(4):1475-85.
3
Gaussian process functional regression modeling for batch data.用于批量数据的高斯过程函数回归建模
Biometrics. 2007 Sep;63(3):714-23. doi: 10.1111/j.1541-0420.2007.00758.x.
4
A latent-class mixture model for incomplete longitudinal Gaussian data.用于不完全纵向高斯数据的潜在类别混合模型。
Biometrics. 2008 Mar;64(1):96-105. doi: 10.1111/j.1541-0420.2007.00837.x. Epub 2007 Jun 30.
5
Proportional hazards regression for cancer studies.癌症研究中的比例风险回归
Biometrics. 2008 Mar;64(1):141-8. doi: 10.1111/j.1541-0420.2007.00830.x. Epub 2007 Jun 15.
6
Flexible Bayesian quantile regression for independent and clustered data.灵活的贝叶斯分位数回归用于独立和聚类数据。
Biostatistics. 2010 Apr;11(2):337-52. doi: 10.1093/biostatistics/kxp049. Epub 2009 Nov 30.
7
The Mizon-Richard encompassing test for the Cox and Aalen additive hazards models.用于考克斯和阿伦相加风险模型的米宗-理查德包含检验。
Biometrics. 2008 Mar;64(1):164-71. doi: 10.1111/j.1541-0420.2007.00840.x. Epub 2007 Jun 30.
8
Variable selection for marginal longitudinal generalized linear models.边际纵向广义线性模型的变量选择
Biometrics. 2005 Jun;61(2):507-14. doi: 10.1111/j.1541-0420.2005.00331.x.
9
Advanced statistics: linear regression, part I: simple linear regression.高级统计学:线性回归,第一部分:简单线性回归
Acad Emerg Med. 2004 Jan;11(1):87-93.
10
Semiparametric inference for surrogate endpoints with bivariate censored data.具有双变量删失数据的替代终点的半参数推断
Biometrics. 2008 Mar;64(1):149-56. doi: 10.1111/j.1541-0420.2007.00834.x. Epub 2007 Jul 25.

引用本文的文献

1
VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data.VICatMix:用于离散生物医学数据的变分贝叶斯聚类和变量选择
Bioinform Adv. 2025 Mar 17;5(1):vbaf055. doi: 10.1093/bioadv/vbaf055. eCollection 2025.
2
Calibration of PurpleAir low-cost particulate matter sensors: model development for air quality under high relative humidity conditions.PurpleAir低成本颗粒物传感器的校准:高相对湿度条件下空气质量的模型开发
Atmos Meas Tech. 2024;17(22):6735-6749. doi: 10.5194/amt-17-6735-2024. Epub 2024 Nov 26.
3
Autoencoder based data clustering for identifying anomalous repetitive hand movements, and behavioral transition patterns in children.
基于自动编码器的数据聚类,用于识别儿童异常重复性手部动作和行为转变模式。
Phys Eng Sci Med. 2025 Mar;48(1):221-238. doi: 10.1007/s13246-024-01507-9. Epub 2025 Jan 21.
4
Acceleration of Brain Atrophy and Progression From Normal Cognition to Mild Cognitive Impairment.脑萎缩加速与正常认知向轻度认知障碍的进展。
JAMA Netw Open. 2024 Oct 1;7(10):e2441505. doi: 10.1001/jamanetworkopen.2024.41505.
5
Morphometrics and Phylogenomics of Coca (Erythroxylum spp.) Illuminate Its Reticulate Evolution, With Implications for Taxonomy.可可(古柯属)的形态计量学和系统基因组学揭示了其网状进化,并对分类学具有启示意义。
Mol Biol Evol. 2024 Jul 3;41(7). doi: 10.1093/molbev/msae114.
6
Semisupervised Deep Learning for the Detection of Foreign Materials on Poultry Meat with Near-Infrared Hyperspectral Imaging.基于近红外高光谱成像的半监督深度学习检测禽肉中外来物。
Sensors (Basel). 2023 Aug 8;23(16):7014. doi: 10.3390/s23167014.
7
Exploring the potential role of bikeshare to complement public transit: The case of San Francisco amid the coronavirus crisis.探索共享单车对公共交通的补充作用:以新冠疫情危机下的旧金山为例。
Cities. 2023 Jun;137:104290. doi: 10.1016/j.cities.2023.104290. Epub 2023 Mar 15.
8
Unobserved classes and extra variables in high-dimensional discriminant analysis.高维判别分析中的未观测类别与额外变量。
Adv Data Anal Classif. 2022;16(1):55-92. doi: 10.1007/s11634-021-00474-3. Epub 2022 Mar 1.
9
Digital phenotyping of sleep patterns among heterogenous samples of Latinx adults using unsupervised learning.使用无监督学习对拉丁裔成年人的异质样本进行睡眠模式的数字表型分析。
Sleep Med. 2021 Sep;85:211-220. doi: 10.1016/j.sleep.2021.07.023. Epub 2021 Jul 19.
10
Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics.狄利克雷过程混合模型中用于变量选择的快速近似推断及其在泛癌蛋白质组学中的应用
Stat Appl Genet Mol Biol. 2019 Dec 12;18(6):/j/sagmb.2019.18.issue-6/sagmb-2018-0065/sagmb-2018-0065.xml. doi: 10.1515/sagmb-2018-0065.