Shortreed Susan M, Johnson Eric, Rutter Carolyn M, Kamineni Aruna, Wernli Karen J, Chubak Jessica
Biostatistics Unit, Group Health Research Institute, Seattle, WA, U.S.A.
RAND Corporation, Santa Monica, CA, U.S.A.
Obs Stud. 2016 Aug;2:51-64. Epub 2016 Sep 26.
Electronic health records and administrative databases provide rich, longitudinal data for health-related research. These data cover large, diverse populations creating excellent research opportunities, but have limitations. In particular, information is available only for individuals who are enrolled in a particular health system; thus, studies often exclude individual's with short enrollment history. Such cohort restriction may cause selection bias in absolute risk estimates for the full enrollee population. We use hazard ratios (HRs) to estimate the association between length of prior enrollment and cancer and all-cause mortality risk. HRs different from one indicate restricted cohorts would produce biased risk estimates for the full enrollee population. Our study sample included 170,708 enrollees of a Western Washington healthcare delivery system. Unadjusted models found individuals with 10 or more years of prior enrollment had higher risk of cancer and death compared to those with less than 5 years prior enrollment (HRs ranged from 1.29 - 3.01). Age- and sex-adjusted models accounted for much of this difference (HRs: 0.93 - 1.24). Models adjusting for additional covariates had similar results (HRs: 0.91 - 1.14). After evaluating potential selection bias, we conclude that, in this setting, age- and sex-standardizing risk estimates can remove most of the bias due to lengthy, prior-enrollment cohort restrictions. Before generalizing estimates based on a selected sample of patients meeting prior enrollment criteria, researchers should assess the potential for selection bias.
电子健康记录和行政数据库为健康相关研究提供了丰富的纵向数据。这些数据涵盖了庞大且多样化的人群,创造了绝佳的研究机会,但也存在局限性。特别是,信息仅适用于已注册特定医疗系统的个人;因此,研究通常会排除注册历史较短的个体。这种队列限制可能会导致对全部注册人群的绝对风险估计出现选择偏差。我们使用风险比(HRs)来估计既往注册时长与癌症及全因死亡风险之间的关联。与1不同的风险比表明,受限队列会对全部注册人群产生有偏差的风险估计。我们的研究样本包括华盛顿州西部一个医疗服务提供系统的170,708名注册者。未调整模型发现,与注册历史少于5年的个体相比,既往注册10年或更长时间的个体患癌症和死亡的风险更高(风险比范围为1.29 - 3.01)。年龄和性别调整模型解释了大部分这种差异(风险比:0.93 - 1.24)。调整其他协变量的模型得出了类似结果(风险比:0.91 - 1.14)。在评估潜在的选择偏差后,我们得出结论,在这种情况下,年龄和性别标准化风险估计可以消除因既往注册队列时间长而导致的大部分偏差。在基于符合既往注册标准的选定患者样本进行估计推广之前,研究人员应评估选择偏差的可能性。