Lui Kung-Jong
Department of Mathematics and Statistics, College of Sciences, San Diego State University, San Diego, CA 92182-7720, USA.
Stat Med. 2005 Jun 15;24(11):1765-76. doi: 10.1002/sim.2060.
When the number of potential controls is large relative to the number of available cases, or when little effort needs to be expended in collecting the relevant information on the controls, we often apply multiple matching to attain the validity or increase the efficiency of our inference in epidemiological studies. In this paper, we focus interval estimation on the difference in proportions for m-to-one matching. We consider four asymptotic interval estimators, including the estimator directly using the Mantel-Haenszel (MH) point estimator, the estimator using the tanh(-1)(x) transformation, the estimator derived from the Cochran-Mantel-Haenszel (CMH) test statistic, and the estimator derived from the quadratic inequality developed in this paper. To evaluate and compare the performance of these estimators, we employ Monte Carlo simulation. We find that the estimator directly using the MH estimator can have the coverage probability less than the desired confidence level when the number of matched sets is small. We note that the estimator derived from the quadratic inequality can perform well when the underlying difference is close to 0 even for a small number of matched sets. However, this estimator tends to have the coverage probability less than the desired confidence level as well when the underlying difference in proportions is large. By contrast, the estimator using the CMH statistic tends to have the coverage probability larger than the desired confidence level when the underlying difference is small. We also find that the estimator using the tanh(-1)(x) transformation consistently outperforms the interval estimator directly using the MH estimator. We use the data regarding the association between induced abortions and ectopic pregnancy to illustrate the use of these estimators.
当潜在对照的数量相对于可用病例的数量较多时,或者在收集对照的相关信息时需要付出的努力较少时,我们在流行病学研究中经常应用多重匹配来实现有效性或提高推断的效率。在本文中,我们聚焦于 m 对 1 匹配比例差异的区间估计。我们考虑了四种渐近区间估计量,包括直接使用 Mantel-Haenszel(MH)点估计量的估计量、使用 tanh(-1)(x)变换的估计量、从 Cochran-Mantel-Haenszel(CMH)检验统计量导出的估计量,以及从本文提出的二次不等式导出的估计量。为了评估和比较这些估计量的性能,我们采用了蒙特卡罗模拟。我们发现,当匹配集数量较少时,直接使用 MH 估计量的估计量的覆盖概率可能小于期望的置信水平。我们注意到,即使对于少量匹配集,当潜在差异接近 0 时,从二次不等式导出的估计量也能表现良好。然而,当比例的潜在差异较大时,该估计量的覆盖概率也往往小于期望的置信水平。相比之下,当潜在差异较小时,使用 CMH 统计量的估计量的覆盖概率往往大于期望的置信水平。我们还发现,使用 tanh(-1)(x)变换的估计量始终优于直接使用 MH 估计量的区间估计量。我们使用关于人工流产与异位妊娠之间关联的数据来说明这些估计量的用法。