From the Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, Pennsylvania.
Epidemiology. 2020 Jul;31(4):542-550. doi: 10.1097/EDE.0000000000001193.
Epidemiologic studies using electronic health record (EHR)-derived phenotypes as outcomes are subject to bias due to phenotyping error. In the case of dichotomous phenotypes, existing methods for misclassified outcomes can be used to reduce bias. In this article, we present a bias correction approach for EHR-derived probabilistic phenotypes: continuous predicted probabilities of the outcome of interest. This approach makes use of correction factors that can be computed by hand and do not require specialized software. We used simulation studies to investigate the performance of the proposed approach under a variety of scenarios for accuracy of the probabilistic phenotype, strength of the outcome/exposure association, and prevalence of the outcome of interest. Across all scenarios investigated, the proposed approach substantially reduced bias in association parameter estimates relative to a naive approach. We demonstrate the application of this approach to a study of pediatric type 2 diabetes using data from the PEDSnet network of children's hospitals. This straightforward correction factor can substantially reduce bias and improve the validity of EHR-based epidemiology.
使用电子健康记录 (EHR) 衍生表型作为结局的流行病学研究可能由于表型错误而存在偏倚。在二分类表型的情况下,可以使用针对错误分类结局的现有方法来减少偏倚。在本文中,我们提出了一种用于 EHR 衍生概率表型的偏倚校正方法:感兴趣结局的连续预测概率。该方法利用可以手动计算且不需要专门软件的校正因子。我们使用模拟研究来研究在概率表型准确性、结局/暴露关联强度和感兴趣结局的患病率等多种情况下,所提出方法的性能。在所研究的所有情况下,与简单的方法相比,所提出的方法可大大减少关联参数估计的偏差。我们使用来自儿科医院网络 PEDSnet 的数据,展示了该方法在儿科 2 型糖尿病研究中的应用。这种简单的校正因子可以大大减少偏倚,提高基于 EHR 的流行病学的有效性。