从电子医疗文档中提取信息：现状与未来研究方向

Information extraction from electronic medical documents: state of the art and future research directions.

作者信息

Landolsi Mohamed Yassine, Hlaoua Lobna, Ben Romdhane Lotfi

机构信息

MARS Research Laboratory, SDM Research Group, ISITCom, University of Sousse, Hammam Sousse, Tunisia.

出版信息

Knowl Inf Syst. 2023;65(2):463-516. doi: 10.1007/s10115-022-01779-1. Epub 2022 Nov 8.

DOI:10.1007/s10115-022-01779-1

PMID:36405956

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9640816/

Abstract

In the medical field, a doctor must have a comprehensive knowledge by reading and writing narrative documents, and he is responsible for every decision he takes for patients. Unfortunately, it is very tiring to read all necessary information about drugs, diseases and patients due to the large amount of documents that are increasing every day. Consequently, so many medical errors can happen and even kill people. Likewise, there is such an important field that can handle this problem, which is the information extraction. There are several important tasks in this field to extract the important and desired information from unstructured text written in natural language. The main principal tasks are named entity recognition and relation extraction since they can structure the text by extracting the relevant information. However, in order to treat the narrative text we should use natural language processing techniques to extract useful information and features. In our paper, we introduce and discuss the several techniques and solutions used in these tasks. Furthermore, we outline the challenges in information extraction from medical documents. In our knowledge, this is the most comprehensive survey in the literature with an experimental analysis and a suggestion for some uncovered directions.

摘要

在医学领域，医生必须通过阅读和撰写叙述性文档来掌握全面的知识，并且要对为患者做出的每一个决定负责。不幸的是，由于每天都在增加的大量文档，阅读所有关于药物、疾病和患者的必要信息非常累人。因此，会发生如此多的医疗差错，甚至导致患者死亡。同样，有一个非常重要的领域可以处理这个问题，即信息提取。在这个领域中有几个重要任务，是从用自然语言编写的非结构化文本中提取重要且所需的信息。主要的关键任务是命名实体识别和关系提取，因为它们可以通过提取相关信息来构建文本结构。然而，为了处理叙述性文本，我们应该使用自然语言处理技术来提取有用的信息和特征。在我们的论文中，我们介绍并讨论了这些任务中使用的几种技术和解决方案。此外，我们概述了从医学文档中提取信息时面临的挑战。据我们所知，这是文献中最全面的综述，包含实验分析以及对一些未涉及方向的建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76bb/9640816/66489653ee85/10115_2022_1779_Fig1_HTML.jpg

相似文献

Information extraction from electronic medical documents: state of the art and future research directions.

Knowl Inf Syst. 2023;65(2):463-516. doi: 10.1007/s10115-022-01779-1. Epub 2022 Nov 8.

MLM-based typographical error correction of unstructured medical texts for named entity recognition.

BMC Bioinformatics. 2022 Nov 16;23(1):486. doi: 10.1186/s12859-022-05035-9.

Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.

J Biomed Semantics. 2016 Apr 27;7:22. doi: 10.1186/s13326-016-0059-z. eCollection 2016.

[A customized method for information extraction from unstructured text data in the electronic medical records].

Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263.

Facilitating clinical research through automation: Combining optical character recognition with natural language processing.

Clin Trials. 2022 Oct;19(5):504-511. doi: 10.1177/17407745221093621. Epub 2022 May 24.

Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.

J Biomed Inform. 2020 Feb;102:103354. doi: 10.1016/j.jbi.2019.103354. Epub 2019 Dec 12.

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

Extracting comprehensive clinical information for breast cancer using deep learning methods.

Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

Extraction of Information Related to Drug Safety Surveillance From Electronic Health Record Notes: Joint Modeling of Entities and Relations Using Knowledge-Aware Neural Attentive Models.

JMIR Med Inform. 2020 Jul 10;8(7):e18417. doi: 10.2196/18417.

Extracting entities with attributes in clinical text via joint deep learning.

J Am Med Inform Assoc. 2019 Dec 1;26(12):1584-1591. doi: 10.1093/jamia/ocz158.

引用本文的文献

Performance of Natural Language Processing for Information Extraction From Electronic Health Records Within Cancer: Systematic Review.

JMIR Med Inform. 2025 Sep 12;13:e68707. doi: 10.2196/68707.

Using large language models to extract information from pediatric clinical reports.

PLOS Digit Health. 2025 Jul 23;4(7):e0000919. doi: 10.1371/journal.pdig.0000919. eCollection 2025 Jul.

A serialization method for digitizing the image-based medical laboratory report.

Digit Health. 2025 Apr 15;11:20552076251334431. doi: 10.1177/20552076251334431. eCollection 2025 Jan-Dec.

Benzodiazepine Initiation and the Risk of Falls or Fall-Related Injuries in Older Adults Following Acute Ischemic Stroke.

Neurol Clin Pract. 2025 Jun;15(3):e200452. doi: 10.1212/CPJ.0000000000200452. Epub 2025 Mar 18.

Scalable information extraction from free text electronic health records using large language models.

BMC Med Res Methodol. 2025 Jan 28;25(1):23. doi: 10.1186/s12874-025-02470-z.

A Large Language Model to Detect Negated Expressions in Radiology Reports.

J Imaging Inform Med. 2025 Jun;38(3):1297-1303. doi: 10.1007/s10278-024-01274-9. Epub 2024 Sep 25.

Privacy-preserving large language models for structured medical information retrieval.

NPJ Digit Med. 2024 Sep 20;7(1):257. doi: 10.1038/s41746-024-01233-2.

LLM-AIx: An open source pipeline for Information Extraction from unstructured medical text based on privacy preserving Large Language Models.

medRxiv. 2024 Sep 3:2024.09.02.24312917. doi: 10.1101/2024.09.02.24312917.

CACER: Clinical concept Annotations for Cancer Events and Relations.

J Am Med Inform Assoc. 2024 Nov 1;31(11):2583-2594. doi: 10.1093/jamia/ocae231.

Use of Generative AI to Identify Helmet Status Among Patients With Micromobility-Related Injuries From Unstructured Clinical Notes.

JAMA Netw Open. 2024 Aug 1;7(8):e2425981. doi: 10.1001/jamanetworkopen.2024.25981.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从电子医疗文档中提取信息：现状与未来研究方向

Information extraction from electronic medical documents: state of the art and future research directions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献