医学术语集标注工具的开发与验证：一种从临床记录中提取和构建医学信息的工具。

Development and validation of MedDRA Tagger: a tool for extraction and structuring medical information from clinical notes.

作者信息

Humbert-Droz Marie, Corley Jessica, Tamang Suzanne, Gevaert Olivier

机构信息

Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, CA.

Meharry Medical College, Nashville, Tennessee.

出版信息

medRxiv. 2022 Dec 14:2022.12.14.22283470. doi: 10.1101/2022.12.14.22283470.

DOI:10.1101/2022.12.14.22283470

PMID:36561189

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9774225/

Abstract

Rapid and automated extraction of clinical information from patients' notes is a desirable though difficult task. Natural language processing (NLP) and machine learning have great potential to automate and accelerate such applications, but developing such models can require a large amount of labeled clinical text, which can be a slow and laborious process. To address this gap, we propose the MedDRA tagger, a fast annotation tool that makes use of industrial level libraries such as spaCy, biomedical ontologies and weak supervision to annotate and extract clinical concepts at scale. The tool can be used to annotate clinical text and obtain labels for training machine learning models and further refine the clinical concept extraction performance, or to extract clinical concepts for observational study purposes. To demonstrate the usability and versatility of our tool, we present three different use cases: we use the tagger to determine patients with a primary brain cancer diagnosis, we show evidence of rising mental health symptoms at the population level and our last use case shows the evolution of COVID-19 symptomatology throughout three waves between February 2020 and October 2021. The validation of our tool showed good performance on both specific annotations from our development set (F1 score 0.81) and open source annotated data set (F1 score 0.79). We successfully demonstrate the versatility of our pipeline with three different use cases. Finally, we note that the modular nature of our tool allows for a straightforward adaptation to another biomedical ontology. We also show that our tool is independent of EHR system, and as such generalizable.

摘要

从患者记录中快速自动提取临床信息是一项虽理想但困难的任务。自然语言处理（NLP）和机器学习在使此类应用自动化和加速方面具有巨大潜力，但开发此类模型可能需要大量带标签的临床文本，这可能是一个缓慢且费力的过程。为了弥补这一差距，我们提出了MedDRA标记器，这是一种快速注释工具，它利用诸如spaCy、生物医学本体和弱监督等工业级库来大规模注释和提取临床概念。该工具可用于注释临床文本并获取用于训练机器学习模型的标签，进一步提高临床概念提取性能，或用于观察性研究目的提取临床概念。为了证明我们工具的可用性和通用性，我们展示了三个不同的用例：我们使用标记器确定患有原发性脑癌诊断的患者，我们展示了人群层面心理健康症状上升的证据，我们的最后一个用例展示了2020年2月至2021年10月期间三波疫情中COVID-19症状的演变。我们工具的验证在我们开发集的特定注释（F1分数为0.81）和开源注释数据集（F1分数为0.79）上均表现良好。我们通过三个不同的用例成功证明了我们管道的通用性。最后，我们注意到我们工具的模块化性质允许直接适应另一种生物医学本体。我们还表明我们的工具独立于电子健康记录系统，因此具有通用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3dc/9774225/6686659d5f71/nihpp-2022.12.14.22283470v1-f0001.jpg

相似文献

Development and validation of MedDRA Tagger: a tool for extraction and structuring medical information from clinical notes.

medRxiv. 2022 Dec 14:2022.12.14.22283470. doi: 10.1101/2022.12.14.22283470.

Strategies to Address the Lack of Labeled Data for Supervised Machine Learning Training With Electronic Health Records: Case Study for the Extraction of Symptoms From Clinical Notes.

JMIR Med Inform. 2022 Mar 14;10(3):e32903. doi: 10.2196/32903.

Ontology-driven and weakly supervised rare disease identification from clinical notes.

BMC Med Inform Decis Mak. 2023 May 5;23(1):86. doi: 10.1186/s12911-023-02181-9.

A Case Demonstration of the Open Health Natural Language Processing Toolkit From the National COVID-19 Cohort Collaborative and the Researching COVID to Enhance Recovery Programs for a Natural Language Processing System for COVID-19 or Postacute Sequelae of SARS CoV-2 Infection: Algorithm Development and Validation.

JMIR Med Inform. 2024 Sep 9;12:e49997. doi: 10.2196/49997.

Identifying signs and symptoms of urinary tract infection from emergency department clinical notes using large language models.

Acad Emerg Med. 2024 Jun;31(6):599-610. doi: 10.1111/acem.14883. Epub 2024 Apr 3.

A natural language processing pipeline to synthesize patient-generated notes toward improving remote care and chronic disease management: a cystic fibrosis case study.

JAMIA Open. 2021 Sep 29;4(3):ooab084. doi: 10.1093/jamiaopen/ooab084. eCollection 2021 Jul.

Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease.

BMC Bioinformatics. 2009 Sep 17;10 Suppl 9(Suppl 9):S12. doi: 10.1186/1471-2105-10-S9-S12.

Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease.

Summit Transl Bioinform. 2009 Mar 1;2009:1-32.

SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes.

BMC Bioinformatics. 2018 Nov 6;19(1):405. doi: 10.1186/s12859-018-2429-2.

Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.

JMIR Form Res. 2023 Mar 7;7:e43014. doi: 10.2196/43014.

本文引用的文献

Strategies to Address the Lack of Labeled Data for Supervised Machine Learning Training With Electronic Health Records: Case Study for the Extraction of Symptoms From Clinical Notes.

JMIR Med Inform. 2022 Mar 14;10(3):e32903. doi: 10.2196/32903.

PLoS One. 2021 Oct 1;16(10):e0257641. doi: 10.1371/journal.pone.0257641. eCollection 2021.

ADE Eval: An Evaluation of Text Processing Systems for Adverse Event Extraction from Drug Labels for Pharmacovigilance.

Drug Saf. 2021 Jan;44(1):83-94. doi: 10.1007/s40264-020-00996-3. Epub 2020 Oct 2.

Clinical characteristics and risk factors associated with COVID-19 disease severity in patients with cancer in Wuhan, China: a multicentre, retrospective, cohort study.

Lancet Oncol. 2020 Jul;21(7):893-903. doi: 10.1016/S1470-2045(20)30309-0. Epub 2020 May 29.

Patients with Cancer Appear More Vulnerable to SARS-CoV-2: A Multicenter Study during the COVID-19 Outbreak.

Cancer Discov. 2020 Jun;10(6):783-791. doi: 10.1158/2159-8290.CD-20-0422. Epub 2020 Apr 28.

Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan.

J Allergy Clin Immunol. 2020 Jul;146(1):110-118. doi: 10.1016/j.jaci.2020.04.006. Epub 2020 Apr 12.

Clinical Text Data in Machine Learning: Systematic Review.

JMIR Med Inform. 2020 Mar 31;8(3):e17984. doi: 10.2196/17984.

Snorkel: rapid training data creation with weak supervision.

VLDB J. 2020;29(2):709-730. doi: 10.1007/s00778-019-00552-1. Epub 2019 Jul 15.

Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review.

J Am Med Inform Assoc. 2019 Apr 1;26(4):364-379. doi: 10.1093/jamia/ocy173.

Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances.

J Biomed Inform. 2018 Dec;88:11-19. doi: 10.1016/j.jbi.2018.10.005. Epub 2018 Oct 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

医学术语集标注工具的开发与验证：一种从临床记录中提取和构建医学信息的工具。

Development and validation of MedDRA Tagger: a tool for extraction and structuring medical information from clinical notes.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献