Berrett Thomas B, Samworth Richard J
University of Warwick, Coventry CV4 7AL, UK.
University of Cambridge, Cambridge CB2 1TN, UK.
Proc Math Phys Eng Sci. 2021 Dec;477(2256):20210549. doi: 10.1098/rspa.2021.0549. Epub 2021 Dec 8.
We present the -statistic permutation (USP) test of independence in the context of discrete data displayed in a contingency table. Either Pearson's -test of independence, or the -test, are typically used for this task, but we argue that these tests have serious deficiencies, both in terms of their inability to control the size of the test, and their power properties. By contrast, the USP test is guaranteed to control the size of the test at the nominal level for all sample sizes, has no issues with small (or zero) cell counts, and is able to detect distributions that violate independence in only a minimal way. The test statistic is derived from a -statistic estimator of a natural population measure of dependence, and we prove that this is the unique minimum variance unbiased estimator of this population quantity. The practical utility of the USP test is demonstrated on both simulated data, where its power can be dramatically greater than those of Pearson's test, the -test and Fisher's exact test, and on real data. The USP test is implemented in the R package USP.
我们提出了列联表中离散数据背景下的独立性 - 统计量置换(USP)检验。通常使用皮尔逊独立性 - 检验或 - 检验来完成此任务,但我们认为这些检验存在严重缺陷,无论是在控制检验规模方面,还是在其功效特性方面。相比之下,USP检验保证能在所有样本量下将检验规模控制在名义水平,对于小(或零)单元格计数没有问题,并且能够检测仅以最小方式违反独立性的分布。检验统计量源自依赖关系的自然总体度量的 - 统计量估计量,并且我们证明这是该总体量的唯一最小方差无偏估计量。USP检验在模拟数据(其功效可能显著大于皮尔逊检验、 - 检验和费舍尔精确检验)和真实数据上都展示了其实用性。USP检验在R包USP中实现。