Lee Donghwan, Choi Dongseok, Lee Youngjo
Department of Statistics, Ewha Womans University, Seoul, Republic of Korea.
OHSU-PSU School of Public Health, Oregon Health & Science University, Portland, OR, USA.
Stat Methods Med Res. 2020 Oct;29(10):2932-2944. doi: 10.1177/0962280220913067. Epub 2020 Mar 27.
In clustering problems, to model the intrinsic structure of unlabeled data, the latent variable models are frequently used. These model-based clustering methods often provide a clustering rule minimizing the total false assignment error. However, in many clustering applications, it is desirable to treat false assignment errors for a certain cluster differently. In this paper, we introduce the false assignment rate for clustering and estimate it by using the extended likelihood approach. We propose VRclust, a novel clustering rule that controls various errors differently across clusters. Real data examples illustrate the usage of estimation of false assignment rate and a simulation study shows that error controls are consistent as the sample size increases.
在聚类问题中,为了对未标记数据的内在结构进行建模,经常使用潜在变量模型。这些基于模型的聚类方法通常会提供一个聚类规则,以最小化总的错误分配误差。然而,在许多聚类应用中,希望对某个聚类的错误分配误差进行不同的处理。在本文中,我们引入了聚类的错误分配率,并使用扩展似然方法对其进行估计。我们提出了VRclust,这是一种新颖的聚类规则,它能在不同聚类之间对各种误差进行不同的控制。实际数据示例说明了错误分配率估计的用法,模拟研究表明,随着样本量的增加,误差控制是一致的。