人类免疫缺陷病毒（HIV）队列和电子健康记录数据中多变量的错误：统计挑战与机遇

Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities.

作者信息

Shepherd Bryan E, Shaw Pamela A

机构信息

Biostatistics, Vanderbilt University, 2525 West End, Suite 11000, 37203Nashville, Tennessee, USA.

Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

出版信息

Stat Commun Infect Dis. 2020 Oct 7;12(Suppl1):20190015. doi: 10.1515/scid-2019-0015. eCollection 2020 Sep 1.

DOI:10.1515/scid-2019-0015

PMID:35880997

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9204761/

Abstract

Observational data derived from patient electronic health records (EHR) data are increasingly used for human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) research. There are challenges to using these data, in particular with regards to data quality; some are recognized, some unrecognized, and some recognized but ignored. There are great opportunities for the statistical community to improve inference by incorporating validation subsampling into analyses of EHR data. Methods to address measurement error, misclassification, and missing data are relevant, as are sampling designs such as two-phase sampling. However, many of the existing statistical methods for measurement error, for example, only address relatively simple settings, whereas the errors seen in these datasets span multiple variables (both predictors and outcomes), are correlated, and even affect who is included in the study. We will discuss some preliminary methods in this area with a particular focus on time-to-event outcomes and outline areas of future research.

摘要

从患者电子健康记录（EHR）数据中获取的观察性数据越来越多地用于人类免疫缺陷病毒/获得性免疫缺陷综合征（HIV/AIDS）研究。使用这些数据存在挑战，尤其是在数据质量方面；有些挑战已被认识到，有些未被认识到，还有些虽被认识到但被忽视了。统计界有很大的机会通过将验证子抽样纳入EHR数据分析来改进推断。解决测量误差、错误分类和缺失数据的方法很重要，诸如两阶段抽样等抽样设计也很重要。然而，许多现有的测量误差统计方法，例如，仅适用于相对简单的情况，而这些数据集中出现的误差跨越多个变量（预测变量和结果变量），相互关联，甚至会影响研究的纳入对象。我们将讨论该领域的一些初步方法，特别关注事件发生时间结局，并概述未来的研究领域。

相似文献

Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities.人类免疫缺陷病毒（HIV）队列和电子健康记录数据中多变量的错误：统计挑战与机遇

Stat Commun Infect Dis. 2020 Oct 7;12(Suppl1):20190015. doi: 10.1515/scid-2019-0015. eCollection 2020 Sep 1.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Improved generalized raking estimators to address dependent covariate and failure-time outcome error.改进的广义耙式估计器，以解决相关协变量和失效时间结果误差。

Biom J. 2021 Jun;63(5):1006-1027. doi: 10.1002/bimj.202000187. Epub 2021 Mar 11.

Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error.耙平和回归校准：解决相关协变量和生存误差偏倚的方法。

Stat Med. 2021 Feb 10;40(3):631-649. doi: 10.1002/sim.8793. Epub 2020 Nov 2.

ACCOUNTING FOR DEPENDENT ERRORS IN PREDICTORS AND TIME-TO-EVENT OUTCOMES USING ELECTRONIC HEALTH RECORDS, VALIDATION SAMPLES, AND MULTIPLE IMPUTATION.利用电子健康记录、验证样本和多重填补法对预测变量和事件发生时间结局中的相关误差进行统计分析

Ann Appl Stat. 2020 Jun;14(2):1045-1061. doi: 10.1214/20-aoas1343. Epub 2020 Jun 29.

Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification.基于电子健康记录的关联研究的统计推断：处理选择偏倚和结局错误分类。

Biometrics. 2022 Mar;78(1):214-226. doi: 10.1111/biom.13400. Epub 2020 Dec 3.

Factors Influencing Data Quality in Electronic Health Record Systems in 50 Health Facilities in Rwanda and the Role of Clinical Alerts: Cross-Sectional Observational Study.卢旺达 50 家卫生机构中电子健康记录系统数据质量的影响因素和临床警报的作用：横断面观察性研究。

JMIR Public Health Surveill. 2024 Jul 3;10:e49127. doi: 10.2196/49127.

Efficient semiparametric inference for two-phase studies with outcome and covariate measurement errors.针对存在结局和协变量测量误差的两阶段研究的高效半参数推断。

Stat Med. 2021 Feb 10;40(3):725-738. doi: 10.1002/sim.8799. Epub 2020 Nov 3.

A method for cohort selection of cardiovascular disease records from an electronic health record system.一种从电子健康记录系统中选择心血管疾病记录队列的方法。

Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.

Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: Empirical illustration using breast cancer recurrence.由于电子病历衍生结局的差异误分类导致 I 类错误率膨胀：基于乳腺癌复发的实证说明。

Pharmacoepidemiol Drug Saf. 2019 Feb;28(2):264-268. doi: 10.1002/pds.4680. Epub 2018 Oct 30.

引用本文的文献

The Lifecycle of Electronic Health Record Data in HIV-Related Big Data Studies: Qualitative Study of Bias Instances and Potential Opportunities for Minimization.HIV相关大数据研究中电子健康记录数据的生命周期：偏差实例及最小化潜在机会的定性研究

J Med Internet Res. 2025 Aug 7;27:e71388. doi: 10.2196/71388.

Adopting Data to Care to Identify and Address Gaps in Services for Children and Adolescents Living With HIV in Mozambique.采用数据关爱模式，以发现并解决莫桑比克 HIV 阳性儿童和青少年服务中的差距。

Glob Health Sci Pract. 2024 Apr 29;12(2). doi: 10.9745/GHSP-D-23-00130.

Optimal sampling for design-based estimators of regression models.基于设计的回归模型估计量的最优抽样。

Stat Med. 2022 Apr 15;41(8):1482-1497. doi: 10.1002/sim.9300. Epub 2022 Jan 6.

本文引用的文献

Two-Phase Sampling Designs for Data Validation in Settings with Covariate Measurement Error and Continuous Outcome.具有协变量测量误差和连续结果的情况下用于数据验证的两阶段抽样设计

J R Stat Soc Ser A Stat Soc. 2021 Oct;184(4):1368-1389. doi: 10.1111/rssa.12689. Epub 2021 Apr 15.

An approximate quasi-likelihood approach for error-prone failure time outcomes and exposures.一种用于有误差的失效时间结局和暴露的近似拟似然方法。

Stat Med. 2021 Oct 15;40(23):5006-5024. doi: 10.1002/sim.9108. Epub 2021 Jun 22.

Efficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort.采用多中心 HIV 研究队列中易错数据进行两阶段抽样的高效优势比估计。

Biometrics. 2022 Dec;78(4):1674-1685. doi: 10.1111/biom.13512. Epub 2021 Aug 1.

Optimal Designs of Two-Phase Studies.两阶段研究的最优设计

J Am Stat Assoc. 2020;115(532):1946-1959. doi: 10.1080/01621459.2019.1671200. Epub 2019 Oct 29.

Efficient semiparametric inference for two-phase studies with outcome and covariate measurement errors.针对存在结局和协变量测量误差的两阶段研究的高效半参数推断。

Stat Med. 2021 Feb 10;40(3):725-738. doi: 10.1002/sim.8799. Epub 2020 Nov 3.

Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error.耙平和回归校准：解决相关协变量和生存误差偏倚的方法。

Stat Med. 2021 Feb 10;40(3):631-649. doi: 10.1002/sim.8793. Epub 2020 Nov 2.

Regression calibration to correct correlated errors in outcome and exposure.回归校准以纠正结局和暴露中相关的误差。

Stat Med. 2021 Jan 30;40(2):271-286. doi: 10.1002/sim.8773. Epub 2020 Oct 21.

Ann Appl Stat. 2020 Jun;14(2):1045-1061. doi: 10.1214/20-aoas1343. Epub 2020 Jun 29.

Multiple-Imputation Variance Estimation in Studies With Missing or Misclassified Inclusion Criteria.缺失或错误分类纳入标准的研究中的多重插补方差估计。

Am J Epidemiol. 2020 Dec 1;189(12):1628-1632. doi: 10.1093/aje/kwaa153.

Self-audits as alternatives to travel-audits for improving data quality in the Caribbean, Central and South America network for HIV epidemiology.自我审核作为旅行审核的替代方式，用于改善加勒比地区、中美洲和南美洲艾滋病毒流行病学网络的数据质量。

J Clin Transl Sci. 2019 Dec 26;4(2):125-132. doi: 10.1017/cts.2019.442. eCollection 2020 Apr.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验