Zhang Hongmei, Zou Yubo, Terry Will, Karmaus Wilfried, Arshad Hasan
School of Public Health, The University of Memphis, Memphis, TN.
Blue Cross Blue Shield of South Carolina, Columbia, SC.
Am Stat. 2019;73(3):296-306. doi: 10.1080/00031305.2018.1424033. Epub 2018 Jul 9.
Traditional clustering methods focus on grouping subjects or (dependent) variables assuming independence between the variables. Clusters formed through these approaches can potentially lack homogeneity. This article proposes a joint clustering method by which both variables and subjects are clustered. In each joint cluster (in general composed of a subset of variables and a subset of subjects), there exists a unique association between dependent variables and covariates of interest. To this end, a Bayesian method is designed, in which a semi-parametric model is used to evaluate any unknown relationships between possibly correlated variables and covariates of interest, and a Dirichlet process is utilized to cluster subjects. Compared to existing clustering techniques, the major novelty of the method exists in its ability to improve the homogeneity of clusters, along with the ability to take the correlations between variables into account. Via simulations, we examine the performance and efficiency of the proposed method. Applying the method to cluster allergens and subjects based on the association of wheal size in reaction to allergens with age, we found that a certain pattern of allergic sensitization to a set of allergens has a potential to reduce the occurrence of asthma.
传统的聚类方法侧重于对个体或(相关)变量进行分组,假定变量之间相互独立。通过这些方法形成的聚类可能缺乏同质性。本文提出了一种联合聚类方法,对变量和个体同时进行聚类。在每个联合聚类中(通常由变量的一个子集和个体的一个子集组成),相关变量与感兴趣的协变量之间存在唯一的关联。为此,设计了一种贝叶斯方法,其中使用半参数模型来评估可能相关的变量与感兴趣的协变量之间的任何未知关系,并利用狄利克雷过程对个体进行聚类。与现有聚类技术相比,该方法的主要新颖之处在于它能够提高聚类的同质性,同时能够考虑变量之间的相关性。通过模拟,我们检验了所提方法的性能和效率。将该方法应用于根据过敏原激发反应中风团大小与年龄的关联对过敏原和个体进行聚类,我们发现对一组过敏原的某种过敏致敏模式有可能降低哮喘的发生率。