Lipsitz Stuart R, Fitzmaurice Garrett M, Sinha Debajyoti, Hevelone Nathanael, Giovannucci Edward, Hu Jim C
Brigham and Women's Hospital, Boston, Massachusetts 02115, U.S.A.
Harvard Medical School, Boston, Massachusetts 02115, U.S.A.
Biometrics. 2015 Sep;71(3):832-40. doi: 10.1111/biom.12297. Epub 2015 Mar 11.
The test of independence of row and column variables in a (J×K) contingency table is a widely used statistical test in many areas of application. For complex survey samples, use of the standard Pearson chi-squared test is inappropriate due to correlation among units within the same cluster. Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) proposed an approach in which the standard Pearson chi-squared statistic is multiplied by a design effect to adjust for the complex survey design. Unfortunately, this test fails to exist when one of the observed cell counts equals zero. Even with the large samples typical of many complex surveys, zero cell counts can occur for rare events, small domains, or contingency tables with a large number of cells. Here, we propose Wald and score test statistics for independence based on weighted least squares estimating equations. In contrast to the Rao-Scott test statistic, the proposed Wald and score test statistics always exist. In simulations, the score test is found to perform best with respect to type I error. The proposed method is motivated by, and applied to, post surgical complications data from the United States' Nationwide Inpatient Sample (NIS) complex survey of hospitals in 2008.
在(J×K)列联表中,行变量和列变量的独立性检验是许多应用领域广泛使用的统计检验。对于复杂的调查样本,由于同一聚类中的单元之间存在相关性,使用标准的Pearson卡方检验是不合适的。Rao和Scott(1981年,《美国统计协会杂志》76卷,221 - 230页)提出了一种方法,即将标准的Pearson卡方统计量乘以一个设计效应,以调整复杂的调查设计。不幸的是,当观察到的单元格计数之一为零时,该检验不存在。即使在许多复杂调查中常见的大样本情况下,对于罕见事件、小区域或具有大量单元格的列联表,也可能出现单元格计数为零的情况。在此,我们基于加权最小二乘估计方程提出用于独立性检验的Wald检验统计量和得分检验统计量。与Rao - Scott检验统计量不同,所提出的Wald检验统计量和得分检验统计量总是存在的。在模拟中,发现得分检验在第一类错误方面表现最佳。所提出的方法源于2008年美国全国住院样本(NIS)对医院进行的复杂调查中的术后并发症数据,并应用于该数据。