采用多中心 HIV 研究队列中易错数据进行两阶段抽样的高效优势比估计。

Efficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort.

机构信息

Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

出版信息

Biometrics. 2022 Dec;78(4):1674-1685. doi: 10.1111/biom.13512. Epub 2021 Aug 1.

DOI:10.1111/biom.13512

PMID:34213008

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8720323/

Abstract

Persons living with HIV engage in routine clinical care, generating large amounts of data in observational HIV cohorts. These data are often error-prone, and directly using them in biomedical research could bias estimation and give misleading results. A cost-effective solution is the two-phase design, under which the error-prone variables are observed for all patients during Phase I, and that information is used to select patients for data auditing during Phase II. For example, the Caribbean, Central, and South America network for HIV epidemiology (CCASAnet) selected a random sample from each site for data auditing. Herein, we consider efficient odds ratio estimation with partially audited, error-prone data. We propose a semiparametric approach that uses all information from both phases and accommodates a number of error mechanisms. We allow both the outcome and covariates to be error-prone and these errors to be correlated, and selection of the Phase II sample can depend on Phase I data in an arbitrary manner. We devise a computationally efficient, numerically stable EM algorithm to obtain estimators that are consistent, asymptotically normal, and asymptotically efficient. We demonstrate the advantages of the proposed methods over existing ones through extensive simulations. Finally, we provide applications to the CCASAnet cohort.

摘要

HIV 感染者参与常规临床护理，在观察性 HIV 队列中产生大量数据。这些数据通常容易出错，如果直接将其用于生物医学研究，可能会导致估计值出现偏差，并得出误导性结果。一种具有成本效益的解决方案是两阶段设计，在该设计下，在第一阶段对所有患者观察容易出错的变量，并利用这些信息在第二阶段选择患者进行数据审核。例如，艾滋病毒流行病学的加勒比、中美洲和南美洲网络 (CCASAnet) 从每个站点中随机选择了一个样本进行数据审核。在此，我们考虑使用部分审核、容易出错的数据进行有效率比值估计。我们提出了一种半参数方法，该方法利用了两个阶段的所有信息，并适应了多种错误机制。我们允许结果和协变量都容易出错，并且这些错误可以相关，并且第二阶段样本的选择可以以任意方式依赖于第一阶段的数据。我们设计了一种计算效率高、数值稳定的 EM 算法来获得一致、渐近正态和渐近有效的估计值。我们通过广泛的模拟展示了所提出方法相对于现有方法的优势。最后，我们将应用于 CCASAnet 队列。

相似文献

Efficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort.

Biometrics. 2022 Dec;78(4):1674-1685. doi: 10.1111/biom.13512. Epub 2021 Aug 1.

Efficient semiparametric inference for two-phase studies with outcome and covariate measurement errors.

Stat Med. 2021 Feb 10;40(3):725-738. doi: 10.1002/sim.8799. Epub 2020 Nov 3.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Efficient Semiparametric Inference Under Two-Phase Sampling, With Applications to Genetic Association Studies.

J Am Stat Assoc. 2017;112(520):1468-1476. doi: 10.1080/01621459.2017.1295864. Epub 2017 Feb 28.

Accounting for data errors discovered from an audit in multiple linear regression.

Biometrics. 2011 Sep;67(3):1083-91. doi: 10.1111/j.1541-0420.2010.01543.x. Epub 2011 Jan 31.

Correcting for Measurement Error in Time-Varying Covariates in Marginal Structural Models.

Am J Epidemiol. 2016 Aug 1;184(3):249-58. doi: 10.1093/aje/kww068. Epub 2016 Jul 13.

Improved generalized raking estimators to address dependent covariate and failure-time outcome error.

Biom J. 2021 Jun;63(5):1006-1027. doi: 10.1002/bimj.202000187. Epub 2021 Mar 11.

Efficient Estimation of Semiparametric Transformation Models for Two-Phase Cohort Studies.

J Am Stat Assoc. 2014 Jan 1;109(505):371-383. doi: 10.1080/01621459.2013.842172.

Estimation and inference of error-prone covariate effect in the presence of confounding variables.

Electron J Stat. 2017;11(1):480-501. doi: 10.1214/17-EJS1242. Epub 2017 Mar 2.

Two-Phase Sampling Designs for Data Validation in Settings with Covariate Measurement Error and Continuous Outcome.

J R Stat Soc Ser A Stat Soc. 2021 Oct;184(4):1368-1389. doi: 10.1111/rssa.12689. Epub 2021 Apr 15.

引用本文的文献

Ascertainment Conditional Maximum Likelihood for Continuous Outcome Under Two-Phase Response-Selective Design.

Stat Med. 2025 Jul;44(15-17):e70111. doi: 10.1002/sim.70111.

Combining Straight-Line and Map-Based Distances to Investigate the Connection Between Proximity to Healthy Foods and Disease.

Stat Med. 2025 Mar 30;44(7):e70054. doi: 10.1002/sim.70054.

Optimal multiwave validation of secondary use data with outcome and exposure misclassification.

Can J Stat. 2024 Jun;52(2):532-554. doi: 10.1002/cjs.11772. Epub 2023 Mar 31.

Association analysis of self-reported outcomes with a validated subset.

Stat Med. 2024 Feb 20;43(4):642-655. doi: 10.1002/sim.9976. Epub 2023 Dec 13.

Lessons learned from over a decade of data audits in international observational HIV cohorts in Latin America and East Africa.

J Clin Transl Sci. 2023 Nov 3;7(1):e245. doi: 10.1017/cts.2023.659. eCollection 2023.

Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities.

Stat Commun Infect Dis. 2020 Oct 7;12(Suppl1):20190015. doi: 10.1515/scid-2019-0015. eCollection 2020 Sep 1.

本文引用的文献

Optimal multiwave sampling for regression modeling in two-phase designs.

Stat Med. 2020 Dec 30;39(30):4912-4921. doi: 10.1002/sim.8760. Epub 2020 Oct 5.

ACCOUNTING FOR DEPENDENT ERRORS IN PREDICTORS AND TIME-TO-EVENT OUTCOMES USING ELECTRONIC HEALTH RECORDS, VALIDATION SAMPLES, AND MULTIPLE IMPUTATION.

Ann Appl Stat. 2020 Jun;14(2):1045-1061. doi: 10.1214/20-aoas1343. Epub 2020 Jun 29.

Self-audits as alternatives to travel-audits for improving data quality in the Caribbean, Central and South America network for HIV epidemiology.

J Clin Transl Sci. 2019 Dec 26;4(2):125-132. doi: 10.1017/cts.2019.442. eCollection 2020 Apr.

The impact of data quality and source data verification on epidemiologic inference: a practical application using HIV observational data.

BMC Public Health. 2019 Dec 30;19(1):1748. doi: 10.1186/s12889-019-8105-2.

IeDEA-WHO Research-Policy Collaboration: contributing real-world evidence to HIV progress reporting and guideline development.

J Virus Erad. 2018 Nov 15;4(Suppl 2):9-15. doi: 10.1016/S2055-6640(20)30348-4.

Efficient Semiparametric Inference Under Two-Phase Sampling, With Applications to Genetic Association Studies.

J Am Stat Assoc. 2017;112(520):1468-1476. doi: 10.1080/01621459.2017.1295864. Epub 2017 Feb 28.

Adaptive sampling in two-phase designs: a biomarker study for progression in arthritis.

Stat Med. 2015 Sep 20;34(21):2899-912. doi: 10.1002/sim.6523. Epub 2015 May 7.

Extended Matrix and Inverse Matrix Methods Utilizing Internal Validation Data When Both Disease and Exposure Status Are Misclassified.

Epidemiol Methods. 2013 Sep 1;2(1):49-66. doi: 10.1515/em-2013-0008.

Binary regression with differentially misclassified response and exposure variables.

Stat Med. 2015 Apr 30;34(9):1605-20. doi: 10.1002/sim.6440. Epub 2015 Feb 4.

Bayesian semiparametric regression in the presence of conditionally heteroscedastic measurement and regression errors.

Biometrics. 2014 Dec;70(4):823-34. doi: 10.1111/biom.12197. Epub 2014 Jun 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

采用多中心 HIV 研究队列中易错数据进行两阶段抽样的高效优势比估计。

Efficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献