Franklin Jessica M, Schneeweiss Sebastian, Polinski Jennifer M, Rassen Jeremy A
Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine Brigham and Women's Hospital and Harvard Medical School 1620 Tremont St., Suite 3030, Boston, MA 02120, USA.
Comput Stat Data Anal. 2014 Apr;72:219-226. doi: 10.1016/j.csda.2013.10.018.
Longitudinal healthcare claims databases are frequently used for studying the comparative safety and effectiveness of medications, but results from these studies may be biased due to residual confounding. It is unclear whether methods for confounding adjustment that have been shown to perform well in small, simple nonrandomized studies are applicable to the large, complex pharmacoepidemiologic studies created from secondary healthcare data. Ordinary simulation approaches for evaluating the performance of statistical methods do not capture important features of healthcare claims. A statistical framework for creating replicated simulation datasets from an empirical cohort study in electronic healthcare claims data is developed and validated. The approach relies on resampling from the observed covariate and exposure data without modification in all simulated datasets to preserve the associations among these variables. Repeated outcomes are simulated using a true treatment effect of the investigator's choice and the baseline hazard function estimated from the empirical data. As an example, this framework is applied to a study of high versus low-intensity statin use and cardiovascular outcomes. Simulated data is based on real data drawn from Medicare Parts A and B linked with a prescription drug insurance claims database maintained by Caremark. Properties of the data simulated using this framework are compared with the empirical data on which the simulations were based. In addition, the simulated datasets are used to compare variable selection strategies for confounder adjustmentvia the propensity score, including high-dimensional approaches that could not be evaluated with ordinary simulation methods. The simulated datasets are found to closely resemble the observed complex data structure but have the advantage of an investigator-specified exposure effect.
纵向医疗保健索赔数据库经常用于研究药物的比较安全性和有效性,但这些研究的结果可能因残余混杂因素而产生偏差。尚不清楚在小型、简单的非随机研究中表现良好的混杂因素调整方法是否适用于从二级医疗保健数据创建的大型、复杂的药物流行病学研究。用于评估统计方法性能的普通模拟方法无法捕捉医疗保健索赔的重要特征。开发并验证了一种从电子医疗保健索赔数据中的实证队列研究创建复制模拟数据集的统计框架。该方法依赖于从观察到的协变量和暴露数据中进行重采样,在所有模拟数据集中不做修改,以保留这些变量之间的关联。使用研究者选择的真实治疗效果和根据实证数据估计的基线风险函数来模拟重复结果。例如,该框架应用于一项关于高强度与低强度他汀类药物使用及心血管结局的研究。模拟数据基于从医疗保险A部分和B部分提取的真实数据,并与Caremark维护的处方药保险索赔数据库相链接。将使用该框架模拟的数据的属性与模拟所基于的实证数据进行比较。此外,模拟数据集用于比较通过倾向得分进行混杂因素调整的变量选择策略,包括普通模拟方法无法评估的高维方法。发现模拟数据集与观察到的复杂数据结构非常相似,但具有研究者指定的暴露效应这一优势。