从临床文本中提取 COVID-19 诊断和症状：一个新的带注释语料库和神经事件抽取框架。

Extracting COVID-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework.

机构信息

Biomedical & Health Informatics, University of Washington, Box 358047, Seattle, WA 98109, USA.

Department of Electrical & Computer Engineering, University of Washington, Campus Box 352500 185, Seattle, WA 98195-2500, USA.

出版信息

J Biomed Inform. 2021 May;117:103761. doi: 10.1016/j.jbi.2021.103761. Epub 2021 Mar 26.

DOI:10.1016/j.jbi.2021.103761

PMID:33781918

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7997694/

Abstract

Coronavirus disease 2019 (COVID-19) is a global pandemic. Although much has been learned about the novel coronavirus since its emergence, there are many open questions related to tracking its spread, describing symptomology, predicting the severity of infection, and forecasting healthcare utilization. Free-text clinical notes contain critical information for resolving these questions. Data-driven, automatic information extraction models are needed to use this text-encoded information in large-scale studies. This work presents a new clinical corpus, referred to as the COVID-19 Annotated Clinical Text (CACT) Corpus, which comprises 1,472 notes with detailed annotations characterizing COVID-19 diagnoses, testing, and clinical presentation. We introduce a span-based event extraction model that jointly extracts all annotated phenomena, achieving high performance in identifying COVID-19 and symptom events with associated assertion values (0.83-0.97 F1 for events and 0.73-0.79 F1 for assertions). Our span-based event extraction model outperforms an extractor built on MetaMapLite for the identification of symptoms with assertion values. In a secondary use application, we predicted COVID-19 test results using structured patient data (e.g. vital signs and laboratory results) and automatically extracted symptom information, to explore the clinical presentation of COVID-19. Automatically extracted symptoms improve COVID-19 prediction performance, beyond structured data alone.

摘要

新型冠状病毒肺炎（COVID-19）是一种全球性的大流行病。自从新型冠状病毒出现以来，人们已经对其有了很多了解，但仍有许多悬而未决的问题，涉及追踪其传播、描述症状、预测感染严重程度和预测医疗保健利用等方面。自由文本临床记录包含了用于解决这些问题的关键信息。需要数据驱动的自动信息提取模型，以便在大规模研究中使用这些文本编码信息。这项工作提出了一个新的临床语料库，称为 COVID-19 注释临床文本（CACT）语料库，它包含 1472 个带有详细注释的笔记，这些注释特征化了 COVID-19 的诊断、检测和临床表现。我们引入了一种基于跨度的事件抽取模型，该模型可以联合抽取所有已注释的现象，在识别 COVID-19 和症状事件方面取得了很高的性能，其关联断言值的 F1 值为 0.83-0.97（事件）和 0.73-0.79（断言）。我们的基于跨度的事件抽取模型在识别具有断言值的症状方面优于基于 MetaMapLite 的抽取器。在二次使用应用程序中，我们使用结构化患者数据（例如生命体征和实验室结果）和自动提取的症状信息来预测 COVID-19 检测结果，以探索 COVID-19 的临床表现。自动提取的症状可提高 COVID-19 预测性能，优于仅使用结构化数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d07/7997694/0c47a5560312/ga1_lrg.jpg

相似文献

Extracting COVID-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework.

J Biomed Inform. 2021 May;117:103761. doi: 10.1016/j.jbi.2021.103761. Epub 2021 Mar 26.

Extracting COVID-19 Diagnoses and Symptoms From Clinical Text: A New Annotated Corpus and Neural Event Extraction Framework.

ArXiv. 2021 Mar 10:arXiv:2012.00974v2.

Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.

J Biomed Inform. 2020 Feb;102:103354. doi: 10.1016/j.jbi.2019.103354. Epub 2019 Dec 12.

Annotating social determinants of health using active learning, and characterizing determinants using neural event extraction.

J Biomed Inform. 2021 Jan;113:103631. doi: 10.1016/j.jbi.2020.103631. Epub 2020 Dec 5.

Automated Travel History Extraction From Clinical Notes for Informing the Detection of Emergent Infectious Disease Events: Algorithm Development and Validation.

JMIR Public Health Surveill. 2021 Mar 24;7(3):e26719. doi: 10.2196/26719.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

A Natural Language Processing Tool Offering Data Extraction for COVID-19 Related Information (DECOVRI).

Stud Health Technol Inform. 2022 Jun 6;290:1062-1063. doi: 10.3233/SHTI220268.

Temporal information extraction from mental health records to identify duration of untreated psychosis.

J Biomed Semantics. 2020 Mar 10;11(1):2. doi: 10.1186/s13326-020-00220-2.

Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) project.

BMJ Open. 2017 Jan 17;7(1):e012012. doi: 10.1136/bmjopen-2016-012012.

Extraction of temporal relations from clinical free text: A systematic review of current approaches.

J Biomed Inform. 2020 Aug;108:103488. doi: 10.1016/j.jbi.2020.103488. Epub 2020 Jul 13.

引用本文的文献

Joint event extraction model based on dynamic attention matching and graph attention networks.

Sci Rep. 2025 Feb 26;15(1):6900. doi: 10.1038/s41598-025-91501-2.

Coronavirus Anatomy and Its Analytical Approaches for Targeting COVID-19.

Adv Exp Med Biol. 2024;1457:33-44. doi: 10.1007/978-3-031-61939-7_2.

Annotation of epilepsy clinic letters for natural language processing.

J Biomed Semantics. 2024 Sep 15;15(1):17. doi: 10.1186/s13326-024-00316-z.

CACER: Clinical concept Annotations for Cancer Events and Relations.

J Am Med Inform Assoc. 2024 Nov 1;31(11):2583-2594. doi: 10.1093/jamia/ocae231.

Exploring COVID-related relationship extraction: Contrasting data sources and analyzing misinformation.

Heliyon. 2024 Feb 28;10(5):e26973. doi: 10.1016/j.heliyon.2024.e26973. eCollection 2024 Mar 15.

Generalizing through Forgetting - Domain Generalization for Symptom Event Extraction in Clinical Notes.

AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:622-631. eCollection 2023.

COVID-19 advising application development for Apple devices (iOS).

PeerJ Comput Sci. 2023 Mar 13;9:e1274. doi: 10.7717/peerj-cs.1274. eCollection 2023.

Computer-aided methods for combating Covid-19 in prevention, detection, and service provision approaches.

Neural Comput Appl. 2023;35(20):14739-14778. doi: 10.1007/s00521-023-08612-y. Epub 2023 May 5.

Supporting the Diagnosis of Fabry Disease Using a Natural Language Processing-Based Approach.

J Clin Med. 2023 May 22;12(10):3599. doi: 10.3390/jcm12103599.

Association of Weight Loss in Ambulatory Care Settings With First Diagnosis of Lung Cancer in the US.

JAMA Netw Open. 2023 May 1;6(5):e2312042. doi: 10.1001/jamanetworkopen.2023.12042.

本文引用的文献

Annotating social determinants of health using active learning, and characterizing determinants using neural event extraction.

J Biomed Inform. 2021 Jan;113:103631. doi: 10.1016/j.jbi.2020.103631. Epub 2020 Dec 5.

A Symptom-Based Rule for Diagnosis of COVID-19.

SN Compr Clin Med. 2020;2(11):1947-1954. doi: 10.1007/s42399-020-00603-7. Epub 2020 Oct 24.

Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients With COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing.

J Med Internet Res. 2020 Oct 28;22(10):e21801. doi: 10.2196/21801.

Personalized predictive models for symptomatic COVID-19 patients using basic preconditions: Hospitalizations, mortality, and the need for an ICU or ventilator.

Int J Med Inform. 2020 Oct;142:104258. doi: 10.1016/j.ijmedinf.2020.104258. Epub 2020 Aug 22.

Predictors of adverse prognosis in COVID-19: A systematic review and meta-analysis.

Eur J Clin Invest. 2020 Oct;50(10):e13362. doi: 10.1111/eci.13362. Epub 2020 Aug 27.

Detection of COVID-19 Infection from Routine Blood Exams with Machine Learning: A Feasibility Study.

J Med Syst. 2020 Jul 1;44(8):135. doi: 10.1007/s10916-020-01597-4.

From Local Explanations to Global Understanding with Explainable AI for Trees.

Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.

Risk factors for adverse clinical outcomes with COVID-19 in China: a multicenter, retrospective, observational study.

Theranostics. 2020 May 15;10(14):6372-6383. doi: 10.7150/thno.46833. eCollection 2020.

A novel simple scoring model for predicting severity of patients with SARS-CoV-2 infection.

Transbound Emerg Dis. 2020 Nov;67(6):2823-2829. doi: 10.1111/tbed.13651. Epub 2020 Jun 13.

Predictive symptoms and comorbidities for severe COVID-19 and intensive care unit admission: a systematic review and meta-analysis.

Int J Public Health. 2020 Jun;65(5):533-546. doi: 10.1007/s00038-020-01390-7. Epub 2020 May 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从临床文本中提取 COVID-19 诊断和症状：一个新的带注释语料库和神经事件抽取框架。

Extracting COVID-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献