Suppr
超能文献

基于大规模电子健康记录数据衍生变量的验证策略。

A strategy for validation of variables derived from large-scale electronic health record data.

机构信息

VA San Diego Healthcare System, 3500 La Jolla Village Dr, San Diego, CA 92161, USA; University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA.

University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA.

出版信息

J Biomed Inform. 2021 Sep;121:103879. doi: 10.1016/j.jbi.2021.103879. Epub 2021 Jul 27.

DOI:10.1016/j.jbi.2021.103879

PMID:34329789

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9615095/

Abstract

PURPOSE

Standardized approaches for rigorous validation of phenotyping from large-scale electronic health record (EHR) data have not been widely reported. We proposed a methodologically rigorous and efficient approach to guide such validation, including strategies for sampling cases and controls, determining sample sizes, estimating algorithm performance, and terminating the validation process, hereafter referred to as the San Diego Approach to Variable Validation (SDAVV).

METHODS

We propose sample size formulae which should be used prior to chart review, based on pre-specified critical lower bounds for positive predictive value (PPV) and negative predictive value (NPV). We also propose a stepwise strategy for iterative algorithm development/validation cycles, updating sample sizes for data abstraction until both PPV and NPV achieve target performance.

RESULTS

We applied the SDAVV to a Department of Veterans Affairs study in which we created two phenotyping algorithms, one for distinguishing normal colonoscopy cases from abnormal colonoscopy controls and one for identifying aspirin exposure. Estimated PPV and NPV both reached 0.970 with a 95% confidence lower bound of 0.915, estimated sensitivity was 0.963 and specificity was 0.975 for identifying normal colonoscopy cases. The phenotyping algorithm for identifying aspirin exposure reached a PPV of 0.990 (a 95% lower bound of 0.950), an NPV of 0.980 (a 95% lower bound of 0.930), and sensitivity and specificity were 0.960 and 1.000.

CONCLUSIONS

A structured approach for prospectively developing and validating phenotyping algorithms from large-scale EHR data can be successfully implemented, and should be considered to improve the quality of "big data" research.

摘要

目的

尚未广泛报道用于严格验证来自大规模电子健康记录 (EHR) 数据的表型的标准化方法。我们提出了一种方法学上严格且高效的方法来指导这种验证，包括用于抽样病例和对照、确定样本量、估计算法性能和终止验证过程的策略，以下简称圣地亚哥变量验证方法 (SDAVV)。

方法

我们提出了在进行图表审查之前应使用的样本量公式，这些公式基于阳性预测值 (PPV) 和阴性预测值 (NPV) 的预先指定的临界下限。我们还提出了一种逐步的算法开发/验证循环策略，迭代更新数据抽象的样本量，直到 PPV 和 NPV 均达到目标性能。

结果

我们将 SDAVV 应用于退伍军人事务部的一项研究中，我们在该研究中创建了两种表型算法，一种用于区分正常结肠镜检查病例和异常结肠镜检查对照，另一种用于识别阿司匹林暴露。估计的 PPV 和 NPV 均达到 0.970，95%置信区间下限为 0.915，估计的敏感性为 0.963，特异性为 0.975，用于识别正常结肠镜检查病例。用于识别阿司匹林暴露的表型算法达到了 0.990 的 PPV（95%下限为 0.950）、0.980 的 NPV（95%下限为 0.930），敏感性和特异性分别为 0.960 和 1.000。

结论

可以成功实施一种用于从大规模 EHR 数据中前瞻性开发和验证表型算法的结构化方法，应考虑使用这种方法来提高“大数据”研究的质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee58/9615095/18dfced9fd71/nihms-1841350-f0001.jpg

相似文献

A strategy for validation of variables derived from large-scale electronic health record data.

J Biomed Inform. 2021 Sep;121:103879. doi: 10.1016/j.jbi.2021.103879. Epub 2021 Jul 27.

Validation of a Major and Clinically Relevant Nonmajor Bleeding Phenotyping Algorithm on Electronic Health Records.

Pharmacoepidemiol Drug Saf. 2024 Aug;33(8):e5875. doi: 10.1002/pds.5875.

Evaluation of algorithms using administrative health and structured electronic medical record data to determine breast and colorectal cancer recurrence in a Canadian province : Using algorithms to determine breast and colorectal cancer recurrence.

BMC Cancer. 2021 Jul 1;21(1):763. doi: 10.1186/s12885-021-08526-9.

Accurately identifying incident cases of venous thromboembolism in the electronic health record: Performance of a novel phenotyping algorithm.

Thromb Res. 2024 Nov;243:109143. doi: 10.1016/j.thromres.2024.109143. Epub 2024 Sep 7.

A collaborative approach to developing an electronic health record phenotyping algorithm for drug-induced liver injury.

J Am Med Inform Assoc. 2013 Dec;20(e2):e243-52. doi: 10.1136/amiajnl-2013-001930. Epub 2013 Jul 9.

Development of electronic health record based algorithms to identify individuals with diabetic retinopathy.

J Am Med Inform Assoc. 2024 Nov 1;31(11):2560-2570. doi: 10.1093/jamia/ocae213.

Multisite validation of a simple electronic health record algorithm for identifying diagnosed obstructive sleep apnea.

J Clin Sleep Med. 2020 Feb 15;16(2):175-183. doi: 10.5664/jcsm.8160. Epub 2020 Jan 13.

Development and validation of an electronic health record-based algorithm for identifying TBI in the VA: A VA Million Veteran Program study.

Brain Inj. 2024 Nov 9;38(13):1084-1092. doi: 10.1080/02699052.2024.2373920. Epub 2024 Jul 14.

Accuracy of phenotyping chronic rhinosinusitis in the electronic health record.

Am J Rhinol Allergy. 2014 Mar-Apr;28(2):140-4. doi: 10.2500/ajra.2014.28.4012.

Optimizing research in symptomatic uterine fibroids with development of a computable phenotype for use with electronic health records.

Am J Obstet Gynecol. 2018 Jun;218(6):610.e1-610.e7. doi: 10.1016/j.ajog.2018.02.002. Epub 2018 Feb 9.

引用本文的文献

Large language models for extracting histopathologic diagnoses of colorectal cancer and dysplasia from electronic health records.

medRxiv. 2025 Apr 22:2024.11.27.24318083. doi: 10.1101/2024.11.27.24318083.

Using electronic medical records to analyze outpatient visits of persons with epilepsy during the pandemic-experience from a low middle income country.

Acta Epileptol. 2025 Jan 15;7(1):6. doi: 10.1186/s42494-024-00192-1.

Automated identification of incidental hepatic steatosis on Emergency Department imaging using large language models.

Hepatol Commun. 2025 Feb 19;9(3). doi: 10.1097/HC9.0000000000000638. eCollection 2025 Mar 1.

Protocol for a type 3 hybrid implementation cluster randomized clinical trial to evaluate the effect of patient and clinician nudges to advance the use of genomic medicine across a diverse health system.

Implement Sci. 2024 Aug 19;19(1):61. doi: 10.1186/s13012-024-01385-5.

Ascertainment of Infection and Eradication Treatment Using a Nationwide Electronic Health Record Database.

Gastro Hep Adv. 2023 Sep 13;3(1):78-83. doi: 10.1016/j.gastha.2023.09.005. eCollection 2024.

Estimated Effect of Restarting Renin-Angiotensin System Inhibitors after Discontinuation on Kidney Outcomes and Mortality.

J Am Soc Nephrol. 2024 Oct 1;35(10):1391-1401. doi: 10.1681/ASN.0000000000000425. Epub 2024 Jun 18.

Reducing disparities and achieving health equity in colorectal cancer screening.

Tech Innov Gastrointest Endosc. 2023;25(3):284-296. doi: 10.1016/j.tige.2023.02.007. Epub 2023 Mar 1.

Development and Validation of Quality Measures for Testosterone Prescribing.

J Endocr Soc. 2023 Jun 14;7(7):bvad075. doi: 10.1210/jendso/bvad075. eCollection 2023 Jun 5.

Development and Validation of the Veterans Affairs Eosinophilic Esophagitis Cohort.

Clin Gastroenterol Hepatol. 2023 Nov;21(12):3030-3040.e4. doi: 10.1016/j.cgh.2023.03.033. Epub 2023 Apr 7.

Optimal Acute Kidney Injury Algorithm for Detecting Acute Kidney Injury at Emergency Department Presentation.

Kidney Med. 2022 Dec 14;5(2):100588. doi: 10.1016/j.xkme.2022.100588. eCollection 2023 Feb.

本文引用的文献

Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models.

Annu Rev Biomed Data Sci. 2018 Jul;1:53-68. doi: 10.1146/annurev-biodatasci-080917-013315. Epub 2018 May 23.

The problem with unadjusted multiple and sequential statistical testing.

Nat Commun. 2019 Apr 23;10(1):1921. doi: 10.1038/s41467-019-09941-0.

Ascertainment of Aspirin Exposure Using Structured and Unstructured Large-scale Electronic Health Record Data.

Med Care. 2019 Oct;57(10):e60-e64. doi: 10.1097/MLR.0000000000001065.

Structured Approach for Evaluating Strategies for Cancer Ascertainment Using Large-Scale Electronic Health Record Data.

JCO Clin Cancer Inform. 2018 Dec;2:1-12. doi: 10.1200/CCI.17.00072.

A phenotyping algorithm to identify acute ischemic stroke accurately from a national biobank: the Million Veteran Program.

Clin Epidemiol. 2018 Oct 16;10:1509-1521. doi: 10.2147/CLEP.S160764. eCollection 2018.

Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances.

J Biomed Inform. 2018 Dec;88:11-19. doi: 10.1016/j.jbi.2018.10.005. Epub 2018 Oct 24.

A Framework for Leveraging "Big Data" to Advance Epidemiology and Improve Quality: Design of the VA Colonoscopy Collaborative.

EGEMS (Wash DC). 2018 Apr 13;6(1):4. doi: 10.5334/egems.198.

Validation of asthma recording in electronic health records: a systematic review.

Clin Epidemiol. 2017 Dec 1;9:643-656. doi: 10.2147/CLEP.S143718. eCollection 2017.

Validation of the Use of Electronic Health Records for Classification of ADHD Status.

J Atten Disord. 2016 Oct 1;23(13):1647-1655. doi: 10.1177/1087054716672337. Print 2019 Nov 1.

Performance of an electronic health record-based phenotype algorithm to identify community associated methicillin-resistant Staphylococcus aureus cases and controls for genetic association studies.

BMC Infect Dis. 2016 Nov 17;16(1):684. doi: 10.1186/s12879-016-2020-2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

基于大规模电子健康记录数据衍生变量的验证策略。

A strategy for validation of variables derived from large-scale electronic health record data.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译