Suppr超能文献

利用图表审查表型中的不确定病例来加强基于电子健康记录的关联研究。

Leveraging undecided cases in chart-reviewed phenotypes to enhance EHR-based association studies.

作者信息

Jian Xinyao, Zhang Dazheng, Yu Zehao, Xu Hua, Bian Jiang, Wu Yonghui, Tong Jiayi, Chen Yong

机构信息

The Center for Health Analytics and Synthesis of Evidence (CHASE), University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, USA.

Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.

出版信息

J Biomed Inform. 2025 Jun;166:104839. doi: 10.1016/j.jbi.2025.104839. Epub 2025 Apr 30.

Abstract

OBJECTIVES

In electronic health record (EHR)-based association studies, phenotyping algorithms efficiently classify patient clinical outcomes into binary categories but are susceptible to misclassification errors. The gold standard, manual chart review, involves clinicians determining the true disease status based on their assessment of health records. These clinicians-labeled phenotypes are labor-intensive and typically limited to a small subset of patients, potentially introducing a third "undecided" category when phenotypes are indeterminate. We aim to effectively integrate the algorithm-derived and chart-reviewed outcomes when both are available in EHR-based association studies.

MATERIAL AND METHODS

We propose an augmented estimation method that combines the binary algorithm-derived phenotypes for the entire cohort with the trinary chart-reviewed phenotypes for a small, selected subset. Additionally, a cost-effective outcome-dependent sampling strategy is used to address the rare disease scenarios. The proposed trinary chart-reviewed phenotype integrated cost-effective augmented estimation (TriCA) was evaluated across a wide range of simulation settings and real-world applications, including using EHR data on Alzheimer's disease and related dementias (ADRD) from the OneFlorida + Clinical Research Network, and using cohort data on second breast cancer events (SBCE) from the Kaiser Permanente Washington.

RESULTS

Compared to estimation based on random sampling, our augmented method improved mean square error by up to 28.3% in simulation studies; compared to estimation using only trinary chart-reviewed phenotypes, our method improved efficiency by up to 33.3% in ADRD data and 50.8% in SBCE data.

DISCUSSION

Our simulation studies and real-world applications demonstrate that, compared to existing methods, the proposed method provides unbiased estimates with higher statistical efficiency.

CONCLUSION

The proposed method effectively combined binary algorithm-derived phenotypes for the whole cohort with trinary chart-reviewed outcomes for a limited validation set, making it applicable to a broader range of applications and enhancing risk factor identification in EHR-based association studies.

摘要

目的

在基于电子健康记录(EHR)的关联研究中,表型分析算法可有效地将患者临床结局分类为二元类别,但容易出现错误分类。金标准是人工病历审查,即临床医生根据对健康记录的评估来确定真实疾病状态。这些临床医生标记的表型需要耗费大量人力,并且通常仅限于一小部分患者,当表型不确定时可能会引入第三个“未决”类别。我们旨在当基于EHR的关联研究中同时有算法得出的结果和病历审查结果时,有效地整合这两种结果。

材料与方法

我们提出一种增强估计方法,该方法将整个队列中基于算法得出的二元表型与一小部分选定子集中经过病历审查的三元表型相结合。此外,还使用了一种具有成本效益的依赖于结局的抽样策略来处理罕见病情况。所提出的经过病历审查的三元表型整合成本效益增强估计(TriCA)方法在广泛的模拟设置和实际应用中进行了评估,包括使用来自OneFlorida + 临床研究网络的阿尔茨海默病及相关痴呆症(ADRD)的EHR数据,以及使用来自凯撒永久医疗集团华盛顿分部的第二次乳腺癌事件(SBCE)队列数据。

结果

与基于随机抽样的估计相比,我们的增强方法在模拟研究中将均方误差提高了28.3%;与仅使用经过病历审查的三元表型进行估计相比,我们的方法在ADRD数据中效率提高了33.3%,在SBCE数据中效率提高了50.8%。

讨论

我们的模拟研究和实际应用表明,与现有方法相比,所提出的方法能提供具有更高统计效率的无偏估计。

结论

所提出的方法有效地将整个队列中基于算法得出的二元表型与有限验证集中经过病历审查的三元结果相结合,使其适用于更广泛的应用,并增强了基于EHR的关联研究中的危险因素识别。

相似文献

1
Leveraging undecided cases in chart-reviewed phenotypes to enhance EHR-based association studies.
J Biomed Inform. 2025 Jun;166:104839. doi: 10.1016/j.jbi.2025.104839. Epub 2025 Apr 30.
2
Leveraging error-prone algorithm-derived phenotypes: Enhancing association studies for risk factors in EHR data.
J Biomed Inform. 2024 Sep;157:104690. doi: 10.1016/j.jbi.2024.104690. Epub 2024 Jul 14.
4
Balancing the efforts of chart review and gains in PRS prediction accuracy: An empirical study.
J Biomed Inform. 2024 Sep;157:104705. doi: 10.1016/j.jbi.2024.104705. Epub 2024 Aug 10.
6
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

本文引用的文献

1
Real-World Effectiveness of BNT162b2 Against Infection and Severe Diseases in Children and Adolescents.
Ann Intern Med. 2024 Feb;177(2):165-176. doi: 10.7326/M23-1754. Epub 2024 Jan 9.
2
Early prediction of Alzheimer's disease and related dementias using real-world electronic health records.
Alzheimers Dement. 2023 Aug;19(8):3506-3518. doi: 10.1002/alz.12967. Epub 2023 Feb 23.
4
Measurement error and misclassification in electronic medical records: methods to mitigate bias.
Curr Epidemiol Rep. 2018 Dec;5(4):343-356. doi: 10.1007/s40471-018-0164-x. Epub 2018 Sep 10.
7
DrugWAS: Drug-wide Association Studies for COVID-19 Drug Repurposing.
Clin Pharmacol Ther. 2021 Dec;110(6):1537-1546. doi: 10.1002/cpt.2376. Epub 2021 Aug 10.
8
Population estimate of people with clinical Alzheimer's disease and mild cognitive impairment in the United States (2020-2060).
Alzheimers Dement. 2021 Dec;17(12):1966-1975. doi: 10.1002/alz.12362. Epub 2021 May 27.
9
Accelerated failure time model for data from outcome-dependent sampling.
Lifetime Data Anal. 2021 Jan;27(1):15-37. doi: 10.1007/s10985-020-09508-y. Epub 2020 Oct 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验