Xu Stanley, Shetterly Susan, Cook Andrea J, Raebel Marsha A, Goonesekera Sunali, Shoaibi Azadeh, Roy Jason, Fireman Bruce
Institute for Health Research, Kaiser Permanente Colorado, Denver, CO, USA.
Biostatistics Unit, Group Health Research Institute, Seattle, WA, USA.
Pharmacoepidemiol Drug Saf. 2016 Apr;25(4):453-61. doi: 10.1002/pds.3983. Epub 2016 Feb 15.
The objective of this study was to evaluate regression, matching, and stratification on propensity score (PS) or disease risk score (DRS) in a setting of sequential analyses where statistical hypotheses are tested multiple times.
In a setting of sequential analyses, we simulated incident users and binary outcomes with different confounding strength, outcome incidence, and the adoption rate of treatment. We compared Type I error rate, empirical power, and time to signal using the following confounder adjustments: (i) regression; (ii) treatment matching (1:1 or 1:4) on PS or DRS; and (iii) stratification on PS or DRS. We estimated PS and DRS using lookwise and cumulative methods (all data up to the current look). We applied these confounder adjustments in examining the association between non-steroidal anti-inflammatory drugs and bleeding.
Propensity score and DRS methods had similar empirical power and time to signal. However, DRS methods yielded Type I error rates up to 17% for 1:4 matching and 15.3% for stratification methods when treatment and outcome were common and confounding strength with treatment was stronger. When treatment and outcome were not common, stratification on PS and DRS and regression yielded 8-10% Type I error rates and inflated empirical power. However, when outcome and treatment were common, both regression and stratification on PS outperformed other matching methods with Type I error rates close to 5%.
We suggest regression and stratification on PS when the outcomes and/or treatment is common and use of matching on PS with higher ratios when outcome or treatment is rare or moderately rare.
本研究的目的是在多次检验统计假设的序贯分析背景下,评估倾向得分(PS)或疾病风险评分(DRS)的回归、匹配和分层情况。
在序贯分析背景下,我们模拟了具有不同混杂强度、结局发生率和治疗采用率的新发用户和二元结局。我们使用以下混杂因素调整方法比较了I型错误率、实证检验功效和信号出现时间:(i)回归;(ii)基于PS或DRS的治疗匹配(1:1或1:4);以及(iii)基于PS或DRS的分层。我们使用向前看和累积方法(截至当前观察点的所有数据)估计PS和DRS。我们在检验非甾体抗炎药与出血之间的关联时应用了这些混杂因素调整方法。
倾向得分法和DRS法具有相似的实证检验功效和信号出现时间。然而,当治疗和结局常见且治疗的混杂强度较强时,DRS法在1:4匹配时的I型错误率高达17%,分层法的I型错误率为15.3%。当治疗和结局不常见时,基于PS和DRS的分层以及回归产生的I型错误率为8 - 10%,且实证检验功效膨胀。然而,当结局和治疗常见时,PS的回归和分层均优于其他匹配方法,I型错误率接近5%。
我们建议,当结局和/或治疗常见时,采用基于PS的回归和分层;当结局或治疗罕见或中度罕见时,采用更高比例的基于PS的匹配。