Chen Yiqun T, Gao Lucy L
Department of Biomedical Data Science, Stanford University.
Department of Statistics, University of British Columbia, November 29, 2023.
ArXiv. 2023 Nov 27:arXiv:2311.16375v1.
For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or -means clustering. The test based on the proposed -value controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.
对于许多应用而言,解释和验证通过聚类获得的观测值组至关重要。一种常见的验证方法涉及测试两个估计聚类中观测值之间特征均值的差异。在这种情况下,经典假设检验会导致第一类错误率膨胀。为了克服这个问题,我们提出了一种新的检验方法,用于检验使用层次聚类或K均值聚类获得的一对聚类之间单个特征的均值差异。基于所提出的p值的检验在有限样本中控制选择性第一类错误率,并且可以高效计算。我们进一步在模拟中说明了我们提议的有效性和功效,并展示了其在单细胞RNA测序数据上的应用。