Chen Yiqun T, Gao Lucy L
Department of Biomedical Data Science, Stanford University, 450 Serra Mall, Stanford, CA 94305, United States.
Department of Statistics, University of British Columbia, 3182 Earth Sciences Building, 2207 Main Mall, Vancouver, BC V6T 1Z4, Canada.
Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxae046.
For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common interpretation and validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or k-means clustering. The test controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.
对于许多应用而言,解释和验证通过聚类获得的观察组至关重要。一种常见的解释和验证方法涉及测试两个估计聚类中观察值之间特征均值的差异。在这种情况下,经典假设检验会导致第一类错误率膨胀。为克服这一问题,我们提出了一种新的检验方法,用于检验使用层次聚类或k均值聚类获得的一对聚类之间单个特征均值的差异。该检验在有限样本中控制选择性第一类错误率,并且可以高效计算。我们进一步在模拟中说明了我们方法的有效性和功效,并展示了其在单细胞RNA测序数据上的应用。