Testing for group structure in high-dimensional data.

Suppr

超能文献

作者信息

McLachlan G J, Rathnayake Suren I

机构信息

Department of Mathematics, University of Queensland, St. Lucia, Queensland, Australia.

出版信息

J Biopharm Stat. 2011 Nov;21(6):1113-25. doi: 10.1080/10543406.2011.608342.

DOI:10.1080/10543406.2011.608342

PMID:22023680

Abstract

With the use of finite mixture models for the clustering of a data set, the crucial question of how many clusters there are in the data can be addressed by testing for the smallest number of components in the mixture model compatible with the data. We investigate the performance of a resampling approach to this latter problem in the context of high-dimensional data, where the number of variables p is extremely large relative to the number of observations n. In order to be able to fit normal mixture models to such data, some form of dimension reduction has to be performed. This raises the question of whether a practically significant bias results if the bootstrapping is undertaken solely on the basis of the reduced dimensional form of the data, rather than using the full data from which to draw the bootstrap sample replications.

摘要

相似文献

Testing for group structure in high-dimensional data.

J Biopharm Stat. 2011 Nov;21(6):1113-25. doi: 10.1080/10543406.2011.608342.

High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length.基于最小消息长度的有限广义狄利克雷混合模型的高维无监督选择与估计

IEEE Trans Pattern Anal Mach Intell. 2007 Oct;29(10):1716-31. doi: 10.1109/TPAMI.2007.1095.

Evaluating mixture modeling for clustering: recommendations and cautions.评估聚类的混合模型：建议和注意事项。

Psychol Methods. 2011 Mar;16(1):63-79. doi: 10.1037/a0022673.

Classification of microarray data with factor mixture models.基于因子混合模型的微阵列数据分类

Bioinformatics. 2006 Jan 15;22(2):202-8. doi: 10.1093/bioinformatics/bti779. Epub 2005 Nov 15.

Mixture models for eye-tracking data: a case study.眼动追踪数据的混合模型：一个案例研究。

Stat Med. 1996 Jul 15;15(13):1365-76. doi: 10.1002/(SICI)1097-0258(19960715)15:13<1365::AID-SIM232>3.0.CO;2-J.

Mixture modelling for cluster analysis.

Stat Methods Med Res. 2004 Oct;13(5):347-61. doi: 10.1191/0962280204sm372ra.

Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps.基于特征过滤方法和扩散映射的高维基因表达数据聚类。

Artif Intell Med. 2010 Feb-Mar;48(2-3):91-8. doi: 10.1016/j.artmed.2009.06.001. Epub 2009 Dec 4.

Internal validation of risk models in clustered data: a comparison of bootstrap schemes.在聚类数据中对风险模型进行内部验证：引导方案的比较。

Am J Epidemiol. 2013 Jun 1;177(11):1209-17. doi: 10.1093/aje/kws396. Epub 2013 May 9.

Bootstrapping with models for count data.使用计数数据模型进行自抽样法

J Biopharm Stat. 2011 Nov;21(6):1164-76. doi: 10.1080/10543406.2011.607748.

Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples.在高维自助抽样样本中针对有偏复杂度选择调整预测误差估计值。

Stat Appl Genet Mol Biol. 2008;7(1):Article12. doi: 10.2202/1544-6115.1346. Epub 2008 Mar 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验