Ross Ryan D, Shi Xu, Caram Megan E V, Tsao Pheobe A, Lin Paul, Bohnert Amy, Zhang Min, Mukherjee Bhramar
Department of Biostatistics, School of Public Health, University of Michigan.
Department of Internal Medicine, Division of Hematology/Oncology, University of Michigan Medical School.
Health Serv Outcomes Res Methodol. 2021 Jun;21(2):206-228. doi: 10.1007/s10742-020-00222-8. Epub 2020 Oct 20.
Medical insurance claims are becoming increasingly common data sources to answer a variety of questions in biomedical research. Although comprehensive in terms of longitudinal characterization of disease development and progression for a potentially large number of patients, population-based inference using these datasets require thoughtful modifications to sample selection and analytic strategies relative to other types of studies. Along with complex selection bias and missing data issues, claims-based studies are purely observational, which limits effective understanding and characterization of the treatment differences between groups being compared. All these issues contribute to a crisis in reproducibility and replication of comparative findings using medical claims. This paper offers practical guidance to the analytical process, demonstrates methods for estimating causal treatment effects with propensity score methods for several types of outcomes common to such studies, such as binary, count, time to event and longitudinally-varying measures, and also aims to increase transparency and reproducibility of reporting of results from these investigations. We provide an online version of the paper with readily implementable code for the entire analysis pipeline to serve as a guided tutorial for practitioners. The online version can be accessed at https://rydaro.github.io/. The analytic pipeline is illustrated using a sub-cohort of patients with advanced prostate cancer from the large Clinformatics TM Data Mart Database (OptumInsight, Eden Prairie, Minnesota), consisting of 73 million distinct private payer insurees from 2001-2016.
医疗保险理赔数据正日益成为生物医学研究中回答各种问题的常见数据源。尽管就大量患者疾病发展和进展的纵向特征而言,这些数据集具有全面性,但与其他类型的研究相比,使用这些数据集进行基于人群的推断需要对样本选择和分析策略进行深思熟虑的调整。除了复杂的选择偏倚和数据缺失问题外,基于理赔数据的研究纯粹是观察性的,这限制了对所比较组之间治疗差异的有效理解和特征描述。所有这些问题都导致了使用医疗理赔数据进行比较研究结果的可重复性和再现性危机。本文为分析过程提供了实用指导,展示了使用倾向评分方法估计几种此类研究常见结果类型(如二元结果、计数结果、事件发生时间和纵向变化测量结果)的因果治疗效果的方法,并且旨在提高这些调查结果报告的透明度和可重复性。我们提供了本文的在线版本,其中包含整个分析流程易于实现的代码,作为从业者的指导教程。可通过https://rydaro.github.io/访问在线版本。使用来自大型临床信息学TM数据集市数据库(OptumInsight,明尼苏达州伊甸草原)的晚期前列腺癌患者子队列说明了分析流程,该数据库包含2001年至2016年期间7300万不同的私人支付者被保险人。