Shepherd Bryan E, Shaw Pamela A
Biostatistics, Vanderbilt University, 2525 West End, Suite 11000, 37203Nashville, Tennessee, USA.
Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Stat Commun Infect Dis. 2020 Oct 7;12(Suppl1):20190015. doi: 10.1515/scid-2019-0015. eCollection 2020 Sep 1.
Observational data derived from patient electronic health records (EHR) data are increasingly used for human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) research. There are challenges to using these data, in particular with regards to data quality; some are recognized, some unrecognized, and some recognized but ignored. There are great opportunities for the statistical community to improve inference by incorporating validation subsampling into analyses of EHR data. Methods to address measurement error, misclassification, and missing data are relevant, as are sampling designs such as two-phase sampling. However, many of the existing statistical methods for measurement error, for example, only address relatively simple settings, whereas the errors seen in these datasets span multiple variables (both predictors and outcomes), are correlated, and even affect who is included in the study. We will discuss some preliminary methods in this area with a particular focus on time-to-event outcomes and outline areas of future research.
从患者电子健康记录(EHR)数据中获取的观察性数据越来越多地用于人类免疫缺陷病毒/获得性免疫缺陷综合征(HIV/AIDS)研究。使用这些数据存在挑战,尤其是在数据质量方面;有些挑战已被认识到,有些未被认识到,还有些虽被认识到但被忽视了。统计界有很大的机会通过将验证子抽样纳入EHR数据分析来改进推断。解决测量误差、错误分类和缺失数据的方法很重要,诸如两阶段抽样等抽样设计也很重要。然而,许多现有的测量误差统计方法,例如,仅适用于相对简单的情况,而这些数据集中出现的误差跨越多个变量(预测变量和结果变量),相互关联,甚至会影响研究的纳入对象。我们将讨论该领域的一些初步方法,特别关注事件发生时间结局,并概述未来的研究领域。