Suppr超能文献

均值聚类的选择性推断。

Selective inference for -means clustering.

作者信息

Chen Yiqun T, Witten Daniela M

机构信息

Data Science Institute and Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA.

Departments of Statistics and Biostatistics, University of Washington, Seattle, WA 98195-4322, USA.

出版信息

J Mach Learn Res. 2023 May;24.

Abstract

We consider the problem of testing for a difference in means between clusters of observations identified via -means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. In recent work, Gao et al. (2022) considered a related problem in the context of hierarchical clustering. Unfortunately, their solution is highly-tailored to the context of hierarchical clustering, and thus cannot be applied in the setting of -means clustering. In this paper, we propose a p-value that conditions on all of the intermediate clustering assignments in the -means algorithm. We show that the p-value controls the selective Type I error for a test of the difference in means between a pair of clusters obtained using -means clustering in finite samples, and can be efficiently computed. We apply our proposal on hand-written digits data and on single-cell RNA-sequencing data.

摘要

我们考虑通过K均值聚类识别的观测簇之间均值差异的检验问题。在这种情况下,经典假设检验会导致第一类错误率膨胀。在最近的工作中,Gao等人(2022年)在层次聚类的背景下考虑了一个相关问题。不幸的是,他们的解决方案是高度针对层次聚类背景的,因此不能应用于K均值聚类的情况。在本文中,我们提出了一个基于K均值算法中所有中间聚类分配的p值。我们表明,该p值在有限样本中控制了使用K均值聚类获得的一对簇之间均值差异检验的选择性第一类错误,并且可以有效地计算。我们将我们的方法应用于手写数字数据和单细胞RNA测序数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d08a/10805457/eae94b28613e/nihms-1916887-f0002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验