Suppr超能文献

通过具有Box-Cox变换的多元t分布进行灵活混合建模:偏t分布的一种替代方法。

Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution.

作者信息

Lo Kenneth, Gottardo Raphael

机构信息

Department of Microbiology, University of Washington, Seattle, WA, USA.

出版信息

Stat Comput. 2012 Jan 1;22(1):33-52. doi: 10.1007/s11222-010-9204-1.

Abstract

Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.

摘要

聚类分析是在数据集中自动搜索同类观测值的组。一种流行的聚类建模方法基于有限正态混合模型,该模型假设每个聚类被建模为多元正态分布。然而,每个成分都是对称的正态性假设通常是不现实的。此外,正态混合模型对异常值不具有鲁棒性;它们通常需要额外的成分来对异常值进行建模和/或对数据的表示不佳。为了解决这些问题,我们提出了一类新的分布,即具有Box-Cox变换的多元t分布,用于混合建模。这类分布用更重尾的t分布推广了正态分布,并通过Box-Cox变换引入了偏度。因此,这提供了一个统一的框架来同时处理异常值识别和数据变换这两个相互关联的问题。我们描述了一种用于参数估计以及变换选择的期望最大化算法。我们用三个真实数据集和模拟研究展示了所提出的方法。与包括偏t混合模型在内的大量方法相比,所提出的具有Box-Cox变换的t混合模型在观测值分配的准确性、对模型误设的鲁棒性以及成分数量的选择方面表现良好。

相似文献

2
Automated gating of flow cytometry data via robust model-based clustering.
Cytometry A. 2008 Apr;73(4):321-32. doi: 10.1002/cyto.a.20531.
3
Growth Mixture Modeling With Nonnormal Distributions: Implications for Data Transformation.
Educ Psychol Meas. 2021 Aug;81(4):698-727. doi: 10.1177/0013164420976773. Epub 2020 Dec 8.
4
Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions.
Biostatistics. 2010 Apr;11(2):317-36. doi: 10.1093/biostatistics/kxp062. Epub 2010 Jan 27.
5
A random effects meta-analysis model with Box-Cox transformation.
BMC Med Res Methodol. 2017 Jul 19;17(1):109. doi: 10.1186/s12874-017-0376-7.
6
Skew Mixture Latent State-Trait Analysis: A Monte Carlo Simulation Study on Statistical Performance.
Front Psychol. 2018 Aug 2;9:1323. doi: 10.3389/fpsyg.2018.01323. eCollection 2018.
7
Robust mixture of experts modeling using the t distribution.
Neural Netw. 2016 Jul;79:20-36. doi: 10.1016/j.neunet.2016.03.002. Epub 2016 Mar 31.
8
Bayesian analysis of nonlinear mixed-effects mixture models for longitudinal data with heterogeneity and skewness.
Stat Med. 2014 Jul 20;33(16):2830-49. doi: 10.1002/sim.6136. Epub 2014 Mar 13.
10
A joint finite mixture model for clustering genes from independent Gaussian and beta distributed data.
BMC Bioinformatics. 2009 May 29;10:165. doi: 10.1186/1471-2105-10-165.

引用本文的文献

1
Barriers of the CNS transfer rate dynamics in patients with vascular cognitive impairment and dementia.
Front Aging Neurosci. 2024 Sep 25;16:1462302. doi: 10.3389/fnagi.2024.1462302. eCollection 2024.
2
AutoGater: a weakly supervised neural network model to gate cells in flow cytometric analyses.
Sci Rep. 2024 Oct 9;14(1):23581. doi: 10.1038/s41598-024-66936-8.
3
Determining classes of food items for health requirements and nutrition guidelines using Gaussian mixture models.
Front Nutr. 2023 Oct 13;10:1186221. doi: 10.3389/fnut.2023.1186221. eCollection 2023.
4
BAYESIAN ANALYSIS FOR IMBALANCED POSITIVE-UNLABELLED DIAGNOSIS CODES IN ELECTRONIC HEALTH RECORDS.
Ann Appl Stat. 2023 Jun;17(2):1220-1238. doi: 10.1214/22-AOAS1666. Epub 2023 May 1.
5
Addressing heterogeneous populations in latent variable settings through robust estimation.
Psychol Methods. 2023 Feb;28(1):39-60. doi: 10.1037/met0000413. Epub 2021 Oct 25.
6
Approximating multivariate posterior distribution functions from Monte Carlo samples for sequential Bayesian inference.
PLoS One. 2020 Mar 13;15(3):e0230101. doi: 10.1371/journal.pone.0230101. eCollection 2020.
8
Variable Selection for Skewed Model-Based Clustering: Application to the Identification of Novel Sleep Phenotypes.
J Am Stat Assoc. 2018;113(521):95-110. doi: 10.1080/01621459.2017.1330202. Epub 2018 May 16.
9
Clinical and environmental influences on metabolic biomarkers collected for newborn screening.
Clin Biochem. 2013 Jan;46(1-2):133-8. doi: 10.1016/j.clinbiochem.2012.09.013. Epub 2012 Sep 23.
10
A computational framework to emulate the human perspective in flow cytometric data analysis.
PLoS One. 2012;7(5):e35693. doi: 10.1371/journal.pone.0035693. Epub 2012 May 1.

本文引用的文献

1
Automated high-dimensional flow cytometric data analysis.
Proc Natl Acad Sci U S A. 2009 May 26;106(21):8519-24. doi: 10.1073/pnas.0903028106. Epub 2009 May 14.
2
flowClust: a Bioconductor package for automated gating of flow cytometry data.
BMC Bioinformatics. 2009 May 14;10:145. doi: 10.1186/1471-2105-10-145.
3
Automated gating of flow cytometry data via robust model-based clustering.
Cytometry A. 2008 Apr;73(4):321-32. doi: 10.1002/cyto.a.20531.
4
Model-based region-of-interest selection in dynamic breast MRI.
J Comput Assist Tomogr. 2006 Jul-Aug;30(4):675-87. doi: 10.1097/00004728-200607000-00020.
5
Donuts, scratches and blanks: robust model-based segmentation of microarray images.
Bioinformatics. 2005 Jun 15;21(12):2875-82. doi: 10.1093/bioinformatics/bti447. Epub 2005 Apr 21.
6
Bioconductor: open software development for computational biology and bioinformatics.
Genome Biol. 2004;5(10):R80. doi: 10.1186/gb-2004-5-10-r80. Epub 2004 Sep 15.
7
A mixture model-based approach to the clustering of microarray expression data.
Bioinformatics. 2002 Mar;18(3):413-22. doi: 10.1093/bioinformatics/18.3.413.
8
Model-based cluster analysis of microarray gene-expression data.
Genome Biol. 2002;3(2):RESEARCH0009. doi: 10.1186/gb-2002-3-2-research0009. Epub 2002 Jan 29.
9
Model-based clustering and data transformations for gene expression data.
Bioinformatics. 2001 Oct;17(10):977-87. doi: 10.1093/bioinformatics/17.10.977.
10
Robust parameter estimation of intensity distributions for brain magnetic resonance images.
IEEE Trans Med Imaging. 1998 Apr;17(2):172-86. doi: 10.1109/42.700730.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验