• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有特定聚类对角协方差矩阵和分组变量的基于惩罚模型的聚类

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables.

作者信息

Xie Benhuai, Pan Wei, Shen Xiaotong

机构信息

Division of Biostatistics, School of Public Health, University of Minnesota,

出版信息

Electron J Stat. 2008;2:168-212. doi: 10.1214/08-EJS194.

DOI:10.1214/08-EJS194
PMID:19920875
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2777718/
Abstract

Clustering analysis is one of the most widely used statistical tools in many emerging areas such as microarray data analysis. For microarray and other high-dimensional data, the presence of many noise variables may mask underlying clustering structures. Hence removing noise variables via variable selection is necessary. For simultaneous variable selection and parameter estimation, existing penalized likelihood approaches in model-based clustering analysis all assume a common diagonal covariance matrix across clusters, which however may not hold in practice. To analyze high-dimensional data, particularly those with relatively low sample sizes, this article introduces a novel approach that shrinks the variances together with means, in a more general situation with cluster-specific (diagonal) covariance matrices. Furthermore, selection of grouped variables via inclusion or exclusion of a group of variables altogether is permitted by a specific form of penalty, which facilitates incorporating subject-matter knowledge, such as gene functions in clustering microarray samples for disease subtype discovery. For implementation, EM algorithms are derived for parameter estimation, in which the M-steps clearly demonstrate the effects of shrinkage and thresholding. Numerical examples, including an application to acute leukemia subtype discovery with microarray gene expression data, are provided to demonstrate the utility and advantage of the proposed method.

摘要

聚类分析是微阵列数据分析等许多新兴领域中使用最广泛的统计工具之一。对于微阵列数据和其他高维数据,许多噪声变量的存在可能会掩盖潜在的聚类结构。因此,通过变量选择去除噪声变量是必要的。对于同时进行变量选择和参数估计,基于模型的聚类分析中现有的惩罚似然方法都假设各聚类间有一个共同的对角协方差矩阵,但在实际中这可能不成立。为了分析高维数据,特别是那些样本量相对较小的数据,本文介绍了一种新颖的方法,即在具有聚类特定(对角)协方差矩阵的更一般情况下,将方差与均值一起收缩。此外,通过特定形式的惩罚允许通过完全包含或排除一组变量来选择分组变量,这有助于纳入主题知识,例如在对微阵列样本进行聚类以发现疾病亚型时纳入基因功能。为了实现,推导了用于参数估计的期望最大化(EM)算法,其中M步清楚地展示了收缩和阈值化的效果。提供了数值示例,包括将其应用于利用微阵列基因表达数据发现急性白血病亚型,以证明所提出方法的实用性和优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/2f2224c4b2e1/nihms127583f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/3b72efce5f97/nihms127583f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/f6a2c30d1a02/nihms127583f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/3165a891cf93/nihms127583f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/b84a83d2b135/nihms127583f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/4db925f72d18/nihms127583f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/3d72dacebb0b/nihms127583f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/6ec951b84943/nihms127583f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/2f2224c4b2e1/nihms127583f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/3b72efce5f97/nihms127583f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/f6a2c30d1a02/nihms127583f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/3165a891cf93/nihms127583f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/b84a83d2b135/nihms127583f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/4db925f72d18/nihms127583f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/3d72dacebb0b/nihms127583f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/6ec951b84943/nihms127583f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f09e/2777718/2f2224c4b2e1/nihms127583f8.jpg

相似文献

1
Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables.具有特定聚类对角协方差矩阵和分组变量的基于惩罚模型的聚类
Electron J Stat. 2008;2:168-212. doi: 10.1214/08-EJS194.
2
Penalized model-based clustering with unconstrained covariance matrices.具有无约束协方差矩阵的基于惩罚模型的聚类
Electron J Stat. 2009 Jan 1;3:1473-1496. doi: 10.1214/09-EJS487.
3
Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data.带罚因子分析器混合模型及其在高维微阵列数据分析中的聚类应用。
Bioinformatics. 2010 Feb 15;26(4):501-8. doi: 10.1093/bioinformatics/btp707. Epub 2009 Dec 23.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
Variable selection in penalized model-based clustering via regularization on grouped parameters.基于分组参数正则化的惩罚模型聚类中的变量选择
Biometrics. 2008 Sep;64(3):921-930. doi: 10.1111/j.1541-0420.2007.00955.x. Epub 2007 Dec 20.
6
Joint Estimation of Precision Matrices in Heterogeneous Populations.异质群体中精度矩阵的联合估计
Electron J Stat. 2016;10(1):1341-1392. doi: 10.1214/16-EJS1137. Epub 2016 May 31.
7
Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data.将基因功能组的先验知识纳入微阵列数据的正则化判别分析。
Bioinformatics. 2007 Dec 1;23(23):3170-7. doi: 10.1093/bioinformatics/btm488. Epub 2007 Oct 12.
8
Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.使用OSCAR进行预测变量的同时回归收缩、变量选择和监督聚类。
Biometrics. 2008 Mar;64(1):115-23. doi: 10.1111/j.1541-0420.2007.00843.x. Epub 2007 Jun 30.
9
A Penalized Matrix Normal Mixture Model for Clustering Matrix Data.一种用于矩阵数据聚类的惩罚矩阵正态混合模型。
Entropy (Basel). 2021 Sep 26;23(10):1249. doi: 10.3390/e23101249.
10
Variable selection for model-based high-dimensional clustering and its application to microarray data.基于模型的高维聚类的变量选择及其在微阵列数据中的应用。
Biometrics. 2008 Jun;64(2):440-8. doi: 10.1111/j.1541-0420.2007.00922.x. Epub 2007 Oct 26.

引用本文的文献

1
Outcome-guided Bayesian clustering for disease subtype discovery using high-dimensional transcriptomic data.使用高维转录组数据进行疾病亚型发现的结果导向贝叶斯聚类
J Appl Stat. 2024 Jun 7;52(1):183-207. doi: 10.1080/02664763.2024.2362275. eCollection 2025.
2
Sparse kernel -means clustering.稀疏核均值聚类
J Appl Stat. 2024 Jun 5;52(1):158-182. doi: 10.1080/02664763.2024.2362266. eCollection 2025.
3
Past mercury exposure and current symptoms of nervous system dysfunction in adults of a First Nation community (Canada).

本文引用的文献

1
Variable selection in penalized model-based clustering via regularization on grouped parameters.基于分组参数正则化的惩罚模型聚类中的变量选择
Biometrics. 2008 Sep;64(3):921-930. doi: 10.1111/j.1541-0420.2007.00955.x. Epub 2007 Dec 20.
2
Variable selection for model-based high-dimensional clustering and its application to microarray data.基于模型的高维聚类的变量选择及其在微阵列数据中的应用。
Biometrics. 2008 Jun;64(2):440-8. doi: 10.1111/j.1541-0420.2007.00922.x. Epub 2007 Oct 26.
3
Logistic regression for disease classification using microarray data: model selection in a large p and small n case.
加拿大一个原住民社区成年人过去的汞暴露情况及当前的神经系统功能障碍症状
Environ Health. 2022 Mar 16;21(1):34. doi: 10.1186/s12940-022-00838-y.
4
Discovering a sparse set of pairwise discriminating features in high-dimensional data.在高维数据中发现一组稀疏的成对判别特征。
Bioinformatics. 2021 Apr 19;37(2):202-212. doi: 10.1093/bioinformatics/btaa690.
5
Estimation of multiple networks in Gaussian mixture models.高斯混合模型中多个网络的估计
Electron J Stat. 2016;10:1133-1154. doi: 10.1214/16-EJS1135. Epub 2016 May 2.
6
Integrative Sparse -Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery.用于疾病亚型发现的基因组应用中具有重叠组套索的整合稀疏均值法
Ann Appl Stat. 2017 Jun;11(2):1011-1039. doi: 10.1214/17-AOAS1033. Epub 2017 Jul 20.
7
Meta-analytic framework for sparse -means to identify disease subtypes in multiple transcriptomic studies.用于在多个转录组学研究中识别疾病亚型的稀疏均值荟萃分析框架。
J Am Stat Assoc. 2016;111(513):27-42. doi: 10.1080/01621459.2015.1086354. Epub 2016 May 5.
8
Statistical Significance of Clustering using Soft Thresholding.使用软阈值法进行聚类的统计学意义。
J Comput Graph Stat. 2015;24(4):975-993. doi: 10.1080/10618600.2014.948179. Epub 2015 Dec 10.
9
Sparse Biclustering of Transposable Data.转座数据的稀疏双聚类
J Comput Graph Stat. 2014;23(4):985-1008. doi: 10.1080/10618600.2013.852554.
10
Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.基于模型的聚类中模型选择和正则化方法在变量选择上的比较
J Soc Fr Statistique (2009). 2014;155(2):57-71.
使用微阵列数据进行疾病分类的逻辑回归:大p小n情况下的模型选择
Bioinformatics. 2007 Aug 1;23(15):1945-51. doi: 10.1093/bioinformatics/btm287. Epub 2007 May 31.
4
Identifying genes that contribute most to good classification in microarrays.识别在微阵列中对良好分类贡献最大的基因。
BMC Bioinformatics. 2006 Sep 7;7:407. doi: 10.1186/1471-2105-7-407.
5
Evaluation and comparison of gene clustering methods in microarray analysis.微阵列分析中基因聚类方法的评估与比较
Bioinformatics. 2006 Oct 1;22(19):2405-12. doi: 10.1093/bioinformatics/btl406. Epub 2006 Jul 31.
6
Semi-supervised learning via penalized mixture model with application to microarray sample classification.基于惩罚混合模型的半监督学习及其在微阵列样本分类中的应用
Bioinformatics. 2006 Oct 1;22(19):2388-95. doi: 10.1093/bioinformatics/btl393. Epub 2006 Jul 26.
7
A data-driven clustering method for time course gene expression data.一种用于时间序列基因表达数据的数据驱动聚类方法。
Nucleic Acids Res. 2006 Mar 1;34(4):1261-9. doi: 10.1093/nar/gkl013. Print 2006.
8
Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data.将生物学知识融入基于距离的微阵列基因表达数据聚类分析中。
Bioinformatics. 2006 May 15;22(10):1259-68. doi: 10.1093/bioinformatics/btl065. Epub 2006 Feb 24.
9
Structured polychotomous machine diagnosis of multiple cancer types using gene expression.使用基因表达对多种癌症类型进行结构化多分类机器诊断。
Bioinformatics. 2006 Apr 15;22(8):950-8. doi: 10.1093/bioinformatics/btl029. Epub 2006 Feb 1.
10
Incorporating gene functions as priors in model-based clustering of microarray gene expression data.在基于模型的微阵列基因表达数据聚类中纳入基因功能作为先验信息。
Bioinformatics. 2006 Apr 1;22(7):795-801. doi: 10.1093/bioinformatics/btl011. Epub 2006 Jan 24.