• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

clustvarsel:一个在R语言中为基于高斯模型的聚类实现变量选择的程序包。

clustvarsel: A Package Implementing Variable Selection for Gaussian Model-Based Clustering in R.

作者信息

Scrucca Luca, Raftery Adrian E

机构信息

Department of Economics, Università degli Studi di Perugia, Via A. Pascoli, 20, 06123 Perugia, Italy, URL: http://www.stat.unipg.it/luca.

Department of Statistics, University of Washington, Box 354320, Seattle, WA 98195-4320, United States of America, URL: http://www.stat.washington.edu/raftery/.

出版信息

J Stat Softw. 2018 Apr;84. doi: 10.18637/jss.v084.i01. Epub 2018 Apr 17.

DOI:10.18637/jss.v084.i01
PMID:30450020
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6238955/
Abstract

Finite mixture modeling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide clustering information. This enables the selection of a more parsimonious model, yielding more efficient estimates, a clearer interpretation and, often, improved clustering partitions. This paper describes the R package which performs subset selection for model-based clustering. An improved version of the Raftery and Dean (2006) methodology is implemented in the new release of the package to find the (locally) optimal subset of variables with group/cluster information in a dataset. Search over the solution space is performed using either a step-wise greedy search or a headlong algorithm. Adjustments for speeding up these algorithms are discussed, as well as a parallel implementation of the stepwise search. Usage of the package is presented through the discussion of several data examples.

摘要

有限混合模型为基于简约高斯混合模型的聚类分析提供了一个框架。在只有一部分可用变量提供聚类信息的情况下,变量或特征选择尤为重要。这使得能够选择一个更简约的模型,从而产生更有效的估计、更清晰的解释,并且通常能改进聚类划分。本文描述了一个用于基于模型的聚类进行子集选择的R包。该包的新版本实现了Raftery和Dean(2006)方法的改进版本,以在数据集中找到具有组/聚类信息的(局部)最优变量子集。使用逐步贪婪搜索或莽撞算法在解空间中进行搜索。讨论了加速这些算法的调整方法,以及逐步搜索的并行实现。通过几个数据示例的讨论展示了该包的用法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba58/6238955/6c62bbd97aed/nihms967394f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba58/6238955/c6590620374b/nihms967394f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba58/6238955/e8c714cdc3c6/nihms967394f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba58/6238955/52f29ce519c8/nihms967394f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba58/6238955/d6b29d76dc2c/nihms967394f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba58/6238955/6c62bbd97aed/nihms967394f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba58/6238955/c6590620374b/nihms967394f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba58/6238955/e8c714cdc3c6/nihms967394f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba58/6238955/52f29ce519c8/nihms967394f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba58/6238955/d6b29d76dc2c/nihms967394f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba58/6238955/6c62bbd97aed/nihms967394f5.jpg

相似文献

1
clustvarsel: A Package Implementing Variable Selection for Gaussian Model-Based Clustering in R.clustvarsel:一个在R语言中为基于高斯模型的聚类实现变量选择的程序包。
J Stat Softw. 2018 Apr;84. doi: 10.18637/jss.v084.i01. Epub 2018 Apr 17.
2
A joint finite mixture model for clustering genes from independent Gaussian and beta distributed data.一种用于对来自独立高斯分布和贝塔分布数据的基因进行聚类的联合有限混合模型。
BMC Bioinformatics. 2009 May 29;10:165. doi: 10.1186/1471-2105-10-165.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Variable selection for clustering with Gaussian mixture models.用于高斯混合模型聚类的变量选择
Biometrics. 2009 Sep;65(3):701-9. doi: 10.1111/j.1541-0420.2008.01160.x. Epub 2009 Feb 4.
5
Simultaneous clustering and variable selection: A novel algorithm and model selection procedure.同时聚类和变量选择:一种新算法和模型选择过程。
Behav Res Methods. 2023 Aug;55(5):2157-2174. doi: 10.3758/s13428-022-01795-7. Epub 2022 Sep 9.
6
Improved initialisation of model-based clustering using Gaussian hierarchical partitions.使用高斯层次划分改进基于模型的聚类初始化。
Adv Data Anal Classif. 2015 Dec;9(4):447-460. doi: 10.1007/s11634-015-0220-z. Epub 2015 Oct 26.
7
mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models.mclust 5:使用高斯有限混合模型进行聚类、分类和密度估计
R J. 2016 Aug;8(1):289-317.
8
An Adaptive Feature Selection Algorithm for Fuzzy Clustering Image Segmentation Based on Embedded Neighbourhood Information Constraints.一种基于嵌入邻域信息约束的模糊聚类图像分割自适应特征选择算法
Sensors (Basel). 2020 Jul 3;20(13):3722. doi: 10.3390/s20133722.
9
Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics.狄利克雷过程混合模型中用于变量选择的快速近似推断及其在泛癌蛋白质组学中的应用
Stat Appl Genet Mol Biol. 2019 Dec 12;18(6):/j/sagmb.2019.18.issue-6/sagmb-2018-0065/sagmb-2018-0065.xml. doi: 10.1515/sagmb-2018-0065.
10
optCluster: An R Package for Determining the Optimal Clustering Algorithm.optCluster:一个用于确定最优聚类算法的R软件包。
Bioinformation. 2017 Mar 31;13(3):101-103. doi: 10.6026/97320630013101. eCollection 2017.

引用本文的文献

1
Identification and Validation of Ferritinophagy-Related Biomarkers in Periodontitis.牙周炎中铁自噬相关生物标志物的鉴定与验证
Int Dent J. 2025 Jun;75(3):1781-1797. doi: 10.1016/j.identj.2025.03.011. Epub 2025 Apr 15.
2
Geographical and disciplinary coverage of open access journals: OpenAlex, Scopus, and WoS.开放获取期刊的地理和学科覆盖范围:OpenAlex、Scopus和WoS。
PLoS One. 2025 Apr 14;20(4):e0320347. doi: 10.1371/journal.pone.0320347. eCollection 2025.
3
Bayesian spatio-temporal modeling of severe acute respiratory syndrome in Brazil: A comparative analysis across pre-, during, and post-COVID-19 eras.巴西严重急性呼吸综合征的贝叶斯时空建模:COVID-19时代之前、期间和之后的比较分析。
Infect Dis Model. 2024 Dec 19;10(2):466-476. doi: 10.1016/j.idm.2024.12.010. eCollection 2025 Jun.
4
Association of socioeconomic status with diabetic microvascular complications: a UK Biobank prospective cohort study.社会经济地位与糖尿病微血管并发症的关联:一项英国生物银行前瞻性队列研究。
Diabetol Metab Syndr. 2025 Jan 20;17(1):24. doi: 10.1186/s13098-025-01584-0.
5
Omada: robust clustering of transcriptomes through multiple testing.Omada:通过多重检验实现转录组的稳健聚类。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae039.
6
A Hyperparameter-Free, Fast and Efficient Framework to Detect Clusters From Limited Samples Based on Ultra High-Dimensional Features.一种基于超高维特征从有限样本中检测聚类的无超参数、快速且高效的框架。
IEEE Access. 2022;10:116844-116857. doi: 10.1109/access.2022.3218800. Epub 2022 Nov 1.
7
A population-health approach to characterizing migraine by comorbidity: Results from the Mindfulness and Migraine Cohort Study.采用人群健康方法通过合并症对偏头痛进行特征描述:正念与偏头痛队列研究的结果。
Cephalalgia. 2022 Oct;42(11-12):1255-1264. doi: 10.1177/03331024221104180. Epub 2022 May 31.
8
Neighbourhood prevalence-to-notification ratios for adult bacteriologically-confirmed tuberculosis reveals hotspots of underdiagnosis in Blantyre, Malawi.社区成人细菌学确诊肺结核的患病率与报告率之比揭示了马拉维布兰太尔漏诊的热点地区。
PLoS One. 2022 May 23;17(5):e0268749. doi: 10.1371/journal.pone.0268749. eCollection 2022.
9
Population density and spreading of COVID-19 in England and Wales.英格兰和威尔士的人口密度与 COVID-19 的传播。
PLoS One. 2022 Mar 31;17(3):e0261725. doi: 10.1371/journal.pone.0261725. eCollection 2022.
10
Brucellosis testing patterns at health facilities in Arusha region, northern Tanzania.坦桑尼亚北部阿鲁沙地区医疗机构的布鲁氏菌病检测模式。
PLoS One. 2022 Mar 23;17(3):e0265612. doi: 10.1371/journal.pone.0265612. eCollection 2022.

本文引用的文献

1
mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models.mclust 5:使用高斯有限混合模型进行聚类、分类和密度估计
R J. 2016 Aug;8(1):289-317.
2
Improved initialisation of model-based clustering using Gaussian hierarchical partitions.使用高斯层次划分改进基于模型的聚类初始化。
Adv Data Anal Classif. 2015 Dec;9(4):447-460. doi: 10.1007/s11634-015-0220-z. Epub 2015 Oct 26.
3
Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.基于模型的聚类中模型选择和正则化方法在变量选择上的比较
J Soc Fr Statistique (2009). 2014;155(2):57-71.
4
Latent Class Analysis Variable Selection.潜在类别分析变量选择
Ann Inst Stat Math. 2010 Feb 1;62(1):11-35. doi: 10.1007/s10463-009-0258-9.
5
A framework for feature selection in clustering.一种用于聚类中特征选择的框架。
J Am Stat Assoc. 2010 Jun 1;105(490):713-726. doi: 10.1198/jasa.2010.tm09415.
6
Variable selection for clustering with Gaussian mixture models.用于高斯混合模型聚类的变量选择
Biometrics. 2009 Sep;65(3):701-9. doi: 10.1111/j.1541-0420.2008.01160.x. Epub 2009 Feb 4.