当前临床自然语言处理系统在处理出院小结中缩写词方面的比较研究。

A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries.

作者信息

Wu Yonghui, Denny Joshua C, Rosenbloom S Trent, Miller Randolph A, Giuse Dario A, Xu Hua

机构信息

Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA.

出版信息

AMIA Annu Symp Proc. 2012;2012:997-1003. Epub 2012 Nov 3.

PMID:23304375

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3540461/

Abstract

Clinical Natural Language Processing (NLP) systems extract clinical information from narrative clinical texts in many settings. Previous research mentions the challenges of handling abbreviations in clinical texts, but provides little insight into how well current NLP systems correctly recognize and interpret abbreviations. In this paper, we compared performance of three existing clinical NLP systems in handling abbreviations: MetaMap, MedLEE, and cTAKES. The evaluation used an expert-annotated gold standard set of clinical documents (derived from from 32 de-identified patient discharge summaries) containing 1,112 abbreviations. The existing NLP systems achieved suboptimal performance in abbreviation identification, with F-scores ranging from 0.165 to 0.601. MedLEE achieved the best F-score of 0.601 for all abbreviations and 0.705 for clinically relevant abbreviations. This study suggested that accurate identification of clinical abbreviations is a challenging task and that more advanced abbreviation recognition modules might improve existing clinical NLP systems.

摘要

临床自然语言处理（NLP）系统可在多种场景下从叙述性临床文本中提取临床信息。以往研究提及了处理临床文本中缩写词的挑战，但对于当前NLP系统正确识别和解释缩写词的能力却鲜有深入探讨。在本文中，我们比较了三种现有的临床NLP系统在处理缩写词方面的性能：MetaMap、MedLEE和cTAKES。评估使用了一组由专家标注的临床文档金标准集（源自32份去标识化的患者出院小结），其中包含1112个缩写词。现有的NLP系统在缩写词识别方面表现欠佳，F值范围为0.165至0.601。MedLEE在所有缩写词上取得了最佳F值0.601，在临床相关缩写词上取得了0.705的F值。本研究表明，准确识别临床缩写词是一项具有挑战性的任务，更先进的缩写词识别模块可能会改进现有的临床NLP系统。

相似文献

A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries.当前临床自然语言处理系统在处理出院小结中缩写词方面的比较研究。

AMIA Annu Symp Proc. 2012;2012:997-1003. Epub 2012 Nov 3.

A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD).从冗长表述到简短缩写的漫长历程：开发一个用于临床缩写识别与消歧的开源框架（CARD）

J Am Med Inform Assoc. 2017 Apr 1;24(e1):e79-e86. doi: 10.1093/jamia/ocw109.

Detecting abbreviations in discharge summaries using machine learning methods.使用机器学习方法检测出院小结中的缩写词。

AMIA Annu Symp Proc. 2011;2011:1541-9. Epub 2011 Oct 22.

Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。

J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.

Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations.结合源自语料库的词义概况与估计的频率信息来消除临床缩写的歧义。

AMIA Annu Symp Proc. 2012;2012:1004-13. Epub 2012 Nov 3.

A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.实时临床缩写词消歧的初步研究

Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.

The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records.2019 年全国自然语言处理（NLP）临床挑战（n2c2）/开放健康自然语言处理（OHNLP）临床记录临床概念规范化共享任务。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1529-1537. doi: 10.1093/jamia/ocaa106.

A study of abbreviations in clinical notes.临床记录中缩写的研究。

AMIA Annu Symp Proc. 2007 Oct 11;2007:821-5.

Automated disambiguation of acronyms and abbreviations in clinical texts: window and training size considerations.临床文本中首字母缩略词和缩写词的自动消歧：窗口与训练规模考量

AMIA Annu Symp Proc. 2012;2012:1310-9. Epub 2012 Nov 3.

Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data.利用机器标注训练数据实现临床缩写词的全面消歧

AMIA Annu Symp Proc. 2017 Feb 10;2016:560-569. eCollection 2016.

引用本文的文献

Benchmarking Transformer Embedding Models for Biomedical Terminology Standardization.用于生物医学术语标准化的基准测试变压器嵌入模型

Mach Learn Appl. 2025 Sep;21. doi: 10.1016/j.mlwa.2025.100683. Epub 2025 Jun 5.

Evaluating Large Language Models in extracting cognitive exam dates and scores.评估大语言模型在提取认知测试日期和分数方面的能力。

PLOS Digit Health. 2024 Dec 11;3(12):e0000685. doi: 10.1371/journal.pdig.0000685. eCollection 2024 Dec.

Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores.评估大型语言模型在提取认知测试日期和分数方面的能力。

medRxiv. 2024 Feb 13:2023.07.10.23292373. doi: 10.1101/2023.07.10.23292373.

Clinical concept recognition: Evaluation of existing systems on EHRs.临床概念识别：对电子健康记录现有系统的评估。

Front Artif Intell. 2023 Jan 13;5:1051724. doi: 10.3389/frai.2022.1051724. eCollection 2022.

Machine learning-driven clinical decision support system for concept-based searching: a field trial in a Norwegian hospital.基于机器学习的临床决策支持系统用于基于概念的搜索：挪威医院的现场试验。

BMC Med Inform Decis Mak. 2023 Jan 10;23(1):5. doi: 10.1186/s12911-023-02101-x.

Deciphering clinical abbreviations with a privacy protecting machine learning system.使用具有隐私保护功能的机器学习系统破译临床缩写。

Nat Commun. 2022 Dec 2;13(1):7456. doi: 10.1038/s41467-022-35007-9.

Natural Language Processing in Nephrology.肾病学中的自然语言处理。

Adv Chronic Kidney Dis. 2022 Sep;29(5):465-471. doi: 10.1053/j.ackd.2022.07.001.

Automated Mapping of Real-world Oncology Laboratory Data to LOINC.真实世界肿瘤学实验室数据到 LOINC 的自动映射。

AMIA Annu Symp Proc. 2022 Feb 21;2021:611-620. eCollection 2021.

Drug knowledge discovery via multi-task learning and pre-trained models.通过多任务学习和预训练模型进行药物知识发现。

BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):251. doi: 10.1186/s12911-021-01614-7.

A deep database of medical abbreviations and acronyms for natural language processing.用于自然语言处理的医学缩写和首字母缩略词的深度数据库。

Sci Data. 2021 Jun 2;8(1):149. doi: 10.1038/s41597-021-00929-4.

本文引用的文献

AMIA Annu Symp Proc. 2012;2012:1004-13. Epub 2012 Nov 3.

Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases.从完整的电子健康记录中提取和整合数据以检测结直肠癌病例。

AMIA Annu Symp Proc. 2011;2011:1564-72. Epub 2011 Oct 22.

Detecting abbreviations in discharge summaries using machine learning methods.使用机器学习方法检测出院小结中的缩写词。

AMIA Annu Symp Proc. 2011;2011:1541-9. Epub 2011 Oct 22.

An evaluation of the UMLS in representing corpus derived clinical concepts.统一医学语言系统（UMLS）在表示源自语料库的临床概念方面的评估。

AMIA Annu Symp Proc. 2011;2011:435-44. Epub 2011 Oct 22.

Part-of-speech tagging for clinical text: wall or bridge between institutions?临床文本的词性标注：机构之间的壁垒还是桥梁？

AMIA Annu Symp Proc. 2011;2011:382-91. Epub 2011 Oct 22.

Natural language processing: an introduction.自然语言处理：入门。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):544-51. doi: 10.1136/amiajnl-2011-000464.

The Yale cTAKES extensions for document classification: architecture and application.耶鲁 CTakes 扩展用于文档分类：架构与应用。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):614-20. doi: 10.1136/amiajnl-2011-000093. Epub 2011 May 27.

Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010.基于机器学习的临床信息抽取三阶段解决方案：i2b2 2010 年的研究现状。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):557-62. doi: 10.1136/amiajnl-2011-000150. Epub 2011 May 12.

Discovering peripheral arterial disease cases from radiology notes using natural language processing.使用自然语言处理技术从放射学记录中发现外周动脉疾病病例。

AMIA Annu Symp Proc. 2010 Nov 13;2010:722-6.

Knowledge-based biomedical word sense disambiguation: comparison of approaches.基于知识的生物医学词义消歧：方法比较。

BMC Bioinformatics. 2010 Nov 22;11:569. doi: 10.1186/1471-2105-11-569.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。