Office of Biostatistics, Division of Biometrics VII, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland.
Stat Med. 2019 Mar 15;38(6):956-968. doi: 10.1002/sim.8030. Epub 2018 Nov 5.
Case-crossover study designs are observational studies used to assess postmarket safety of medical products (eg, vaccines or drugs). As a case-crossover study is self-controlled, its advantages include better control for confounding because the design controls for any time-invariant measured and unmeasured confounding and potentially greater feasibility as only data from those experiencing an event (or cases) are required. However, self-matching also introduces correlation between case and control periods within a subject or matched unit. To estimate sample size in a case-crossover study, investigators currently use Dupont's formula (Biometrics 1988; 43:1157-1168), which was originally developed for a matched case-control study. This formula is relevant as it takes into account correlation in exposure between controls and cases, which are expected to be high in self-controlled studies. However, in our study, we show that Dupont's formula and other currently used methods to determine sample size for case-crossover studies may be inadequate. Specifically, these formulas tend to underestimate the true required sample size, determined through simulations, for a range of values in the parameter space. We present mathematical derivations to explain where some currently used methods fail and propose two new sample size estimation methods that provide a more accurate estimate of the true required sample size.
病例交叉研究设计是一种用于评估医疗产品(如疫苗或药物)上市后安全性的观察性研究。由于病例交叉研究是自我对照的,其优点包括更好地控制混杂因素,因为设计可以控制任何时间不变的测量和未测量的混杂因素,并且由于只需要那些经历过事件(或病例)的数据,因此潜在地更可行。然而,自我匹配也会在一个个体或匹配单元内的病例和对照期之间引入相关性。为了估计病例交叉研究中的样本量,研究人员目前使用 Dupont 公式(Biometrics 1988; 43:1157-1168),该公式最初是为匹配病例对照研究开发的。该公式是相关的,因为它考虑了对照和病例之间暴露的相关性,在自我对照研究中,这种相关性预计会很高。然而,在我们的研究中,我们表明 Dupont 公式和其他目前用于确定病例交叉研究样本量的方法可能是不充分的。具体来说,这些公式往往会低估通过模拟确定的参数空间中一系列值的真实所需样本量。我们提出了数学推导来解释一些当前使用的方法失败的原因,并提出了两种新的样本量估计方法,这些方法可以更准确地估计真实所需的样本量。