Beesley Lauren J, Fritsche Lars G, Mukherjee Bhramar
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.
Stat Med. 2020 Jun 30;39(14):1965-1979. doi: 10.1002/sim.8524. Epub 2020 Mar 20.
Large-scale association analyses based on observational health care databases such as electronic health records have been a topic of increasing interest in the scientific community. However, challenges due to nonprobability sampling and phenotype misclassification associated with the use of these data sources are often ignored in standard analyses. The extent of the bias introduced by ignoring these factors is not well-characterized. In this paper, we develop an analytic framework for characterizing the bias expected in disease-gene association studies based on electronic health records when disease status misclassification and the sampling mechanism are ignored. Through a sensitivity analysis approach, this framework can be used to obtain plausible values for parameters of interest given summary results from standard analysis. We develop an online tool for performing this sensitivity analysis. Simulations demonstrate promising properties of the proposed method. We apply our approach to study bias in disease-gene association studies using electronic health record data from the Michigan Genomics Initiative, a longitudinal biorepository effort within The University Michigan health system.
基于诸如电子健康记录等观察性医疗保健数据库的大规模关联分析,一直是科学界日益关注的话题。然而,在标准分析中,常常忽略了与使用这些数据源相关的非概率抽样和表型错误分类所带来的挑战。忽略这些因素所引入的偏差程度尚未得到充分描述。在本文中,我们开发了一个分析框架,用于刻画在忽略疾病状态错误分类和抽样机制的情况下,基于电子健康记录的疾病 - 基因关联研究中预期的偏差。通过敏感性分析方法,该框架可用于根据标准分析的汇总结果获得感兴趣参数的合理值。我们开发了一个用于执行这种敏感性分析的在线工具。模拟结果表明了所提出方法的良好特性。我们应用我们的方法,利用密歇根基因组计划的电子健康记录数据,研究疾病 - 基因关联研究中的偏差,该计划是密歇根大学健康系统内的一项纵向生物样本库项目。