Suppr超能文献

与相关变量的联合聚类

Joint clustering with correlated variables.

作者信息

Zhang Hongmei, Zou Yubo, Terry Will, Karmaus Wilfried, Arshad Hasan

机构信息

School of Public Health, The University of Memphis, Memphis, TN.

Blue Cross Blue Shield of South Carolina, Columbia, SC.

出版信息

Am Stat. 2019;73(3):296-306. doi: 10.1080/00031305.2018.1424033. Epub 2018 Jul 9.

Abstract

Traditional clustering methods focus on grouping subjects or (dependent) variables assuming independence between the variables. Clusters formed through these approaches can potentially lack homogeneity. This article proposes a joint clustering method by which both variables and subjects are clustered. In each joint cluster (in general composed of a subset of variables and a subset of subjects), there exists a unique association between dependent variables and covariates of interest. To this end, a Bayesian method is designed, in which a semi-parametric model is used to evaluate any unknown relationships between possibly correlated variables and covariates of interest, and a Dirichlet process is utilized to cluster subjects. Compared to existing clustering techniques, the major novelty of the method exists in its ability to improve the homogeneity of clusters, along with the ability to take the correlations between variables into account. Via simulations, we examine the performance and efficiency of the proposed method. Applying the method to cluster allergens and subjects based on the association of wheal size in reaction to allergens with age, we found that a certain pattern of allergic sensitization to a set of allergens has a potential to reduce the occurrence of asthma.

摘要

传统的聚类方法侧重于对个体或(相关)变量进行分组,假定变量之间相互独立。通过这些方法形成的聚类可能缺乏同质性。本文提出了一种联合聚类方法,对变量和个体同时进行聚类。在每个联合聚类中(通常由变量的一个子集和个体的一个子集组成),相关变量与感兴趣的协变量之间存在唯一的关联。为此,设计了一种贝叶斯方法,其中使用半参数模型来评估可能相关的变量与感兴趣的协变量之间的任何未知关系,并利用狄利克雷过程对个体进行聚类。与现有聚类技术相比,该方法的主要新颖之处在于它能够提高聚类的同质性,同时能够考虑变量之间的相关性。通过模拟,我们检验了所提方法的性能和效率。将该方法应用于根据过敏原激发反应中风团大小与年龄的关联对过敏原和个体进行聚类,我们发现对一组过敏原的某种过敏致敏模式有可能降低哮喘的发生率。

相似文献

1
Joint clustering with correlated variables.
Am Stat. 2019;73(3):296-306. doi: 10.1080/00031305.2018.1424033. Epub 2018 Jul 9.
2
The nested joint clustering via Dirichlet process mixture model.
J Stat Comput Simul. 2019;89(5):815-830. doi: 10.1080/00949655.2019.1572756. Epub 2019 Jan 28.
3
Adjusting background noise in cluster analyses of longitudinal data.
Comput Stat Data Anal. 2017 May;109:93-104. doi: 10.1016/j.csda.2016.11.009. Epub 2016 Nov 27.
4
DIRICHLET-TREE MULTINOMIAL MIXTURES FOR CLUSTERING MICROBIOME COMPOSITIONS.
Ann Appl Stat. 2022 Sep;16(3):1476-1499. doi: 10.1214/21-aoas1552. Epub 2022 Jul 19.
5
The potential of clustering methods to define intersection test scenarios: Assessing real-life performance of AEB.
Accid Anal Prev. 2018 Apr;113:1-11. doi: 10.1016/j.aap.2018.01.010. Epub 2018 Jan 30.
9
A non-parametric Bayesian approach for clustering and tracking non-stationarities of neural spikes.
J Neurosci Methods. 2014 Feb 15;223:85-91. doi: 10.1016/j.jneumeth.2013.12.005. Epub 2013 Dec 12.
10
Bayesian semiparametric joint models for functional predictors.
J Am Stat Assoc. 2009;104(485):26-36. doi: 10.1198/jasa.2009.0001. Epub 2012 Jan 1.

引用本文的文献

1
Clustering Approach for Detecting Multiple Types of Adversarial Examples.
Sensors (Basel). 2022 May 18;22(10):3826. doi: 10.3390/s22103826.

本文引用的文献

1
A Nonparametric Bayesian Model for Local Clustering with Application to Proteomics.
J Am Stat Assoc. 2013 Jan 1;108(503). doi: 10.1080/01621459.2013.784705.
3
A visual analytics approach for understanding biclustering results from microarray data.
BMC Bioinformatics. 2008 May 27;9:247. doi: 10.1186/1471-2105-9-247.
4
Bayesian biclustering of gene expression data.
BMC Genomics. 2008;9 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2164-9-S1-S4.
5
Modeling unobserved sources of heterogeneity in animal abundance using a Dirichlet process prior.
Biometrics. 2008 Jun;64(2):635-44. doi: 10.1111/j.1541-0420.2007.00873.x. Epub 2007 Aug 3.
6
BiVisu: software tool for bicluster detection and visualization.
Bioinformatics. 2007 Sep 1;23(17):2342-4. doi: 10.1093/bioinformatics/btm338. Epub 2007 Jun 22.
7
A systematic comparison and evaluation of biclustering methods for gene expression data.
Bioinformatics. 2006 May 1;22(9):1122-9. doi: 10.1093/bioinformatics/btl060. Epub 2006 Feb 24.
8
GEMS: a web server for biclustering analysis of expression data.
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W596-9. doi: 10.1093/nar/gki469.
9
Biclustering of expression data.
Proc Int Conf Intell Syst Mol Biol. 2000;8:93-103.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验