基于高维模型的聚类的成对变量选择。

Pairwise variable selection for high-dimensional model-based clustering.

作者信息

Guo Jian, Levina Elizaveta, Michailidis George, Zhu Ji

机构信息

Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, USA.

出版信息

Biometrics. 2010 Sep;66(3):793-804. doi: 10.1111/j.1541-0420.2009.01341.x.

DOI:10.1111/j.1541-0420.2009.01341.x

PMID:19912170

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2888949/

Abstract

Variable selection for clustering is an important and challenging problem in high-dimensional data analysis. Existing variable selection methods for model-based clustering select informative variables in a "one-in-all-out" manner; that is, a variable is selected if at least one pair of clusters is separable by this variable and removed if it cannot separate any of the clusters. In many applications, however, it is of interest to further establish exactly which clusters are separable by each informative variable. To address this question, we propose a pairwise variable selection method for high-dimensional model-based clustering. The method is based on a new pairwise penalty. Results on simulated and real data show that the new method performs better than alternative approaches that use ℓ(1) and ℓ(∞) penalties and offers better interpretation.

摘要

聚类的变量选择是高维数据分析中的一个重要且具有挑战性的问题。现有的基于模型聚类的变量选择方法以“逐一进出”的方式选择信息变量；也就是说，如果至少有一对聚类可以通过该变量分离，则选择该变量，如果它不能分离任何聚类，则将其删除。然而，在许多应用中，进一步确定每个信息变量可以分离哪些聚类是很有意义的。为了解决这个问题，我们提出了一种用于基于高维模型聚类的成对变量选择方法。该方法基于一种新的成对惩罚。模拟数据和真实数据的结果表明，新方法比使用ℓ(1)和ℓ(∞)惩罚的替代方法表现更好，并且具有更好的解释性。

相似文献

Pairwise variable selection for high-dimensional model-based clustering.基于高维模型的聚类的成对变量选择。

Biometrics. 2010 Sep;66(3):793-804. doi: 10.1111/j.1541-0420.2009.01341.x.

Variable selection for model-based high-dimensional clustering and its application to microarray data.基于模型的高维聚类的变量选择及其在微阵列数据中的应用。

Biometrics. 2008 Jun;64(2):440-8. doi: 10.1111/j.1541-0420.2007.00922.x. Epub 2007 Oct 26.

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.使用OSCAR进行预测变量的同时回归收缩、变量选择和监督聚类。

Biometrics. 2008 Mar;64(1):115-23. doi: 10.1111/j.1541-0420.2007.00843.x. Epub 2007 Jun 30.

Variable selection for clustering with Gaussian mixture models.用于高斯混合模型聚类的变量选择

Biometrics. 2009 Sep;65(3):701-9. doi: 10.1111/j.1541-0420.2008.01160.x. Epub 2009 Feb 4.

Variable selection in penalized model-based clustering via regularization on grouped parameters.基于分组参数正则化的惩罚模型聚类中的变量选择

Biometrics. 2008 Sep;64(3):921-930. doi: 10.1111/j.1541-0420.2007.00955.x. Epub 2007 Dec 20.

caBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic data.caBIG VISDA：用于基因组数据聚类分析的建模、可视化与发现

BMC Bioinformatics. 2008 Sep 18;9:383. doi: 10.1186/1471-2105-9-383.

Simultaneous estimation of cluster number and feature sparsity in high-dimensional cluster analysis.高维聚类分析中同时估计簇数和特征稀疏性。

Biometrics. 2022 Jun;78(2):574-585. doi: 10.1111/biom.13449. Epub 2021 Mar 15.

Understanding and enhancement of internal clustering validation measures.理解和增强内部聚类验证措施。

IEEE Trans Cybern. 2013 Jun;43(3):982-94. doi: 10.1109/TSMCB.2012.2220543. Epub 2012 Oct 26.

Identifying clusters in genomics data by recursive partitioning.通过递归划分识别基因组学数据中的聚类。

Stat Appl Genet Mol Biol. 2013 Oct 1;12(5):637-52. doi: 10.1515/sagmb-2013-0016.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

引用本文的文献

Heterogeneous Functional Regression for Subgroup Analysis.用于亚组分析的异质性功能回归

J Comput Graph Stat. 2024 Dec 20. doi: 10.1080/10618600.2024.2414113.

Regression Trees With Fused Leaves.带融合叶的回归树

Stat Med. 2024 Dec 30;43(30):5872-5884. doi: 10.1002/sim.10272. Epub 2024 Nov 20.

A statistical learning method for simultaneous copy number estimation and subclone clustering with single-cell sequencing data.一种用于单细胞测序数据的拷贝数估计和亚克隆聚类的统计学习方法。

Genome Res. 2024 Feb 7;34(1):85-93. doi: 10.1101/gr.278098.123.

Nonparametric prediction distribution from resolution-wise regression with heterogeneous data.基于分辨率的异质数据回归的非参数预测分布。

J Bus Econ Stat. 2023;41(4):1157-1172. doi: 10.1080/07350015.2022.2115498. Epub 2022 Oct 6.

A Hyperparameter-Free, Fast and Efficient Framework to Detect Clusters From Limited Samples Based on Ultra High-Dimensional Features.一种基于超高维特征从有限样本中检测聚类的无超参数、快速且高效的框架。

IEEE Access. 2022;10:116844-116857. doi: 10.1109/access.2022.3218800. Epub 2022 Nov 1.

Integrative clustering methods for multi-omics data.多组学数据的整合聚类方法。

Wiley Interdiscip Rev Comput Stat. 2022 May-Jun;14(3). doi: 10.1002/wics.1553. Epub 2021 Feb 7.

Identifying Heterogeneous Effect using Latent Supervised Clustering with Adaptive Fusion.使用具有自适应融合的潜在监督聚类识别异质效应。

J Comput Graph Stat. 2021;30(1):43-54. doi: 10.1080/10618600.2020.1763808. Epub 2020 Jun 30.

Covariance-enhanced discriminant analysis.协方差增强判别分析

Biometrika. 2015;102(1):33-45. doi: 10.1093/biomet/asu049. Epub 2014 Dec 3.

Clustering High-Dimensional Landmark-based Two-dimensional Shape Data.基于高维地标点的二维形状数据聚类

J Am Stat Assoc. 2015 Nov 7;110(115):946-961. doi: 10.1080/01621459.2015.1034802. Epub 2015 Apr 16.

Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.基于模型的聚类中模型选择和正则化方法在变量选择上的比较

J Soc Fr Statistique (2009). 2014;155(2):57-71.

本文引用的文献

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables.具有特定聚类对角协方差矩阵和分组变量的基于惩罚模型的聚类

Electron J Stat. 2008;2:168-212. doi: 10.1214/08-EJS194.

Variable Selection using MM Algorithms.使用MM算法进行变量选择

Ann Stat. 2005;33(4):1617-1642. doi: 10.1214/009053605000000200.

Simultaneous factor selection and collapsing levels in ANOVA.方差分析中的同时因子选择与水平合并

Biometrics. 2009 Mar;65(1):169-77. doi: 10.1111/j.1541-0420.2008.01061.x. Epub 2008 May 28.

Mixture models with multiple levels, with application to the analysis of multifactor gene expression data.具有多个层次的混合模型及其在多因素基因表达数据分析中的应用。

Biostatistics. 2008 Jul;9(3):540-54. doi: 10.1093/biostatistics/kxm051. Epub 2008 Feb 5.

Variable selection for model-based high-dimensional clustering and its application to microarray data.基于模型的高维聚类的变量选择及其在微阵列数据中的应用。

Biometrics. 2008 Jun;64(2):440-8. doi: 10.1111/j.1541-0420.2007.00922.x. Epub 2007 Oct 26.

Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling.通过基因表达谱分析对儿童急性淋巴细胞白血病进行分类、亚型发现及预后预测。

Cancer Cell. 2002 Mar;1(2):133-43. doi: 10.1016/s1535-6108(02)00032-6.

Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.利用基因表达谱和人工神经网络进行癌症的分类与诊断预测。

Nat Med. 2001 Jun;7(6):673-9. doi: 10.1038/89044.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验