将基于规则的轻量级自然语言处理与快速迭代图表判定相结合，以从电子健康记录（EHR）数据创建一个大型、精确整理的队列：一项临床试验模拟背景下的案例研究。

Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large, Accurately Curated Cohort from EHR data: A Case Study in the Context of a Clinical Trial Emulation.

作者信息

Mutalik Pradeep, Cheung Kei-Hoi, Green Jennifer, Buelt-Gebhardt Melissa, Anderson Karen F, Jeanpaul Vales, McDonald Linda, Wininger Michael, Li Yuli, Rajeevan Nallakkandi, Jessel Peter M, Moore Hans, Adabag Selçuk, Raitt Merritt H, Aslan Mihaela

机构信息

VA Cooperative Studies Program Clinical Epidemiology Research Center (CSP-CERC), VA Connecticut Healthcare System, West Haven, CT.

Yale University School of Medicine, New Haven, CT.

出版信息

AMIA Annu Symp Proc. 2025 May 22;2024:847-856. eCollection 2024.

PMID:40417550

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12099393/

Abstract

The aim of this work was to create a gold-standard curated cohort of 10,000+ cases from the Veteran Affairs (VA) corporate data warehouse (CDW) for virtual emulation of a randomized clinical trial (CSP#592). The trial had six inclusion/exclusion criteria lacking adequate structured data. We therefore used a hybrid computer/human approach to extract information from clinical notes. Rule-based NLP output was iteratively adjudicated by a panel of trained non-clinician content experts and non-experts using an easy-to-use spreadsheet-based rapid adjudication display. This group-adjudication process iteratively sharpened both the computer algorithm and clinical decision criteria, while simultaneously training the non-experts. The cohort was successfully created with each inclusion/exclusion decision backed by a source document. Less than 0.5% of cases required referral to specialist clinicians. It is likely that such curated datasets capturing specialist reasoning and using a process-supervised approach will acquire greater importance as training tools for future clinical AI applications.

摘要

这项工作的目的是从退伍军人事务部（VA）企业数据仓库（CDW）中创建一个包含10000多个病例的金标准精选队列，用于虚拟模拟随机临床试验（CSP#592）。该试验有六个纳入/排除标准，但缺乏足够的结构化数据。因此，我们采用了计算机/人工混合方法从临床记录中提取信息。基于规则的自然语言处理输出由一组经过培训的非临床内容专家和非专家使用基于电子表格的易于使用的快速判定显示进行迭代判定。这个小组判定过程迭代地完善了计算机算法和临床决策标准，同时培训了非专家。该队列成功创建，每个纳入/排除决定都有原始文件支持。不到0.5%的病例需要转诊给专科临床医生。随着未来临床人工智能应用培训工具的发展，这种能够捕捉专家推理并采用过程监督方法的精选数据集可能会变得更加重要。

相似文献

Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large, Accurately Curated Cohort from EHR data: A Case Study in the Context of a Clinical Trial Emulation.将基于规则的轻量级自然语言处理与快速迭代图表判定相结合，以从电子健康记录（EHR）数据创建一个大型、精确整理的队列：一项临床试验模拟背景下的案例研究。

AMIA Annu Symp Proc. 2025 May 22;2024:847-856. eCollection 2024.

Natural language processing to identify lupus nephritis phenotype in electronic health records.利用自然语言处理技术在电子健康记录中识别狼疮性肾炎表型。

BMC Med Inform Decis Mak. 2024 Mar 3;22(Suppl 2):348. doi: 10.1186/s12911-024-02420-7.

Natural Language Processing Algorithm to Extract Multiple Myeloma Stage From Oncology Notes in the Veterans Affairs Healthcare System.自然语言处理算法从退伍军人事务医疗保健系统中的肿瘤学记录中提取多发性骨髓瘤分期。

JCO Clin Cancer Inform. 2024 Jul;8:e2300197. doi: 10.1200/CCI.23.00197.

A large language model-based generative natural language processing framework fine-tuned on clinical notes accurately extracts headache frequency from electronic health records.基于大型语言模型的生成式自然语言处理框架，在临床笔记上进行了微调，能够从电子健康记录中准确提取头痛频率。

Headache. 2024 Apr;64(4):400-409. doi: 10.1111/head.14702. Epub 2024 Mar 25.

Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phenotyping of Cognitive Status in Electronic Health Records: Diagnostic Study.开发和评估一种自然语言处理标注工具以促进电子健康记录中认知状态的表型分析：诊断研究。

J Med Internet Res. 2022 Aug 30;24(8):e40384. doi: 10.2196/40384.

Identifying Information Gaps in Electronic Health Records by Using Natural Language Processing: Gynecologic Surgery History Identification.利用自然语言处理识别电子健康记录中的信息空白：妇科手术史识别。

J Med Internet Res. 2022 Jan 28;24(1):e29015. doi: 10.2196/29015.

A method for cohort selection of cardiovascular disease records from an electronic health record system.一种从电子健康记录系统中选择心血管疾病记录队列的方法。

Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.

Natural language processing of radiology reports for identification of skeletal site-specific fractures.放射科报告的自然语言处理以识别骨骼部位特异性骨折。

BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):73. doi: 10.1186/s12911-019-0780-5.

The use of natural language processing to identify vaccine-related anaphylaxis at five health care systems in the Vaccine Safety Datalink.利用自然语言处理技术在疫苗安全数据链中的五个医疗系统中识别与疫苗相关的过敏反应。

Pharmacoepidemiol Drug Saf. 2020 Feb;29(2):182-188. doi: 10.1002/pds.4919. Epub 2019 Dec 3.

Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.使用自然语言处理从阿尔茨海默病患者的临床记录中提取睡眠信息。

J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177.

本文引用的文献

Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records.运用生成式人工智能与检索增强生成相结合，从电子健康记录中总结和提取关键临床信息。

J Biomed Inform. 2024 Aug;156:104662. doi: 10.1016/j.jbi.2024.104662. Epub 2024 Jun 14.

Comparative Analysis of Multimodal Large Language Model Performance on Clinical Vignette Questions.多模态大语言模型在临床病例问题上的性能比较分析

JAMA. 2024 Apr 16;331(15):1320-1321. doi: 10.1001/jama.2023.27861.

ChatGPT hallucinating: can it get any more humanlike?ChatGPT产生幻觉：它能变得更像人类吗？

Eur Heart J. 2024 Feb 1;45(5):321-323. doi: 10.1093/eurheartj/ehad766.

Large Language Models Answer Medical Questions Accurately, but Can't Match Clinicians' Knowledge.大型语言模型能准确回答医学问题，但无法与临床医生的知识相媲美。

JAMA. 2023 Sep 5;330(9):792-794. doi: 10.1001/jama.2023.14311.

Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge.生成式人工智能模型在复杂诊断挑战中的准确性。

JAMA. 2023 Jul 3;330(1):78-80. doi: 10.1001/jama.2023.8288.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现：使用大语言模型进行人工智能辅助医学教育的潜力。

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

Heart Disease and Stroke Statistics-2023 Update: A Report From the American Heart Association.《心脏病与卒中统计数据-2023 更新：美国心脏协会报告》。

Circulation. 2023 Feb 21;147(8):e93-e621. doi: 10.1161/CIR.0000000000001123. Epub 2023 Jan 25.

Methods of Public Health Research - Strengthening Causal Inference from Observational Data.公共卫生研究方法——加强基于观察性数据的因果推断

N Engl J Med. 2021 Oct 7;385(15):1345-1348. doi: 10.1056/NEJMp2113319. Epub 2021 Oct 2.

Bridging the "last mile" gap between AI implementation and operation: "data awareness" that matters.弥合人工智能实施与运营之间的“最后一英里”差距：至关重要的“数据意识”。

Ann Transl Med. 2020 Apr;8(7):501. doi: 10.21037/atm.2020.03.63.

Natural Language Processing for EHR-Based Computational Phenotyping.基于电子健康记录的自然语言处理计算表型。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):139-153. doi: 10.1109/TCBB.2018.2849968. Epub 2018 Jun 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。