Suppr超能文献

当前临床自然语言处理系统在处理出院小结中缩写词方面的比较研究。

A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries.

作者信息

Wu Yonghui, Denny Joshua C, Rosenbloom S Trent, Miller Randolph A, Giuse Dario A, Xu Hua

机构信息

Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA.

出版信息

AMIA Annu Symp Proc. 2012;2012:997-1003. Epub 2012 Nov 3.

Abstract

Clinical Natural Language Processing (NLP) systems extract clinical information from narrative clinical texts in many settings. Previous research mentions the challenges of handling abbreviations in clinical texts, but provides little insight into how well current NLP systems correctly recognize and interpret abbreviations. In this paper, we compared performance of three existing clinical NLP systems in handling abbreviations: MetaMap, MedLEE, and cTAKES. The evaluation used an expert-annotated gold standard set of clinical documents (derived from from 32 de-identified patient discharge summaries) containing 1,112 abbreviations. The existing NLP systems achieved suboptimal performance in abbreviation identification, with F-scores ranging from 0.165 to 0.601. MedLEE achieved the best F-score of 0.601 for all abbreviations and 0.705 for clinically relevant abbreviations. This study suggested that accurate identification of clinical abbreviations is a challenging task and that more advanced abbreviation recognition modules might improve existing clinical NLP systems.

摘要

临床自然语言处理(NLP)系统可在多种场景下从叙述性临床文本中提取临床信息。以往研究提及了处理临床文本中缩写词的挑战,但对于当前NLP系统正确识别和解释缩写词的能力却鲜有深入探讨。在本文中,我们比较了三种现有的临床NLP系统在处理缩写词方面的性能:MetaMap、MedLEE和cTAKES。评估使用了一组由专家标注的临床文档金标准集(源自32份去标识化的患者出院小结),其中包含1112个缩写词。现有的NLP系统在缩写词识别方面表现欠佳,F值范围为0.165至0.601。MedLEE在所有缩写词上取得了最佳F值0.601,在临床相关缩写词上取得了0.705的F值。本研究表明,准确识别临床缩写词是一项具有挑战性的任务,更先进的缩写词识别模块可能会改进现有的临床NLP系统。

相似文献

3
Detecting abbreviations in discharge summaries using machine learning methods.
AMIA Annu Symp Proc. 2011;2011:1541-9. Epub 2011 Oct 22.
4
Ensembles of natural language processing systems for portable phenotyping solutions.
J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.
6
A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.
Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.
8
A study of abbreviations in clinical notes.
AMIA Annu Symp Proc. 2007 Oct 11;2007:821-5.
10
Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data.
AMIA Annu Symp Proc. 2017 Feb 10;2016:560-569. eCollection 2016.

引用本文的文献

1
Benchmarking Transformer Embedding Models for Biomedical Terminology Standardization.
Mach Learn Appl. 2025 Sep;21. doi: 10.1016/j.mlwa.2025.100683. Epub 2025 Jun 5.
2
Evaluating Large Language Models in extracting cognitive exam dates and scores.
PLOS Digit Health. 2024 Dec 11;3(12):e0000685. doi: 10.1371/journal.pdig.0000685. eCollection 2024 Dec.
3
Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores.
medRxiv. 2024 Feb 13:2023.07.10.23292373. doi: 10.1101/2023.07.10.23292373.
4
Clinical concept recognition: Evaluation of existing systems on EHRs.
Front Artif Intell. 2023 Jan 13;5:1051724. doi: 10.3389/frai.2022.1051724. eCollection 2022.
6
Deciphering clinical abbreviations with a privacy protecting machine learning system.
Nat Commun. 2022 Dec 2;13(1):7456. doi: 10.1038/s41467-022-35007-9.
7
Natural Language Processing in Nephrology.
Adv Chronic Kidney Dis. 2022 Sep;29(5):465-471. doi: 10.1053/j.ackd.2022.07.001.
8
Automated Mapping of Real-world Oncology Laboratory Data to LOINC.
AMIA Annu Symp Proc. 2022 Feb 21;2021:611-620. eCollection 2021.
9
Drug knowledge discovery via multi-task learning and pre-trained models.
BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):251. doi: 10.1186/s12911-021-01614-7.
10
A deep database of medical abbreviations and acronyms for natural language processing.
Sci Data. 2021 Jun 2;8(1):149. doi: 10.1038/s41597-021-00929-4.

本文引用的文献

3
Detecting abbreviations in discharge summaries using machine learning methods.
AMIA Annu Symp Proc. 2011;2011:1541-9. Epub 2011 Oct 22.
4
An evaluation of the UMLS in representing corpus derived clinical concepts.
AMIA Annu Symp Proc. 2011;2011:435-44. Epub 2011 Oct 22.
5
Part-of-speech tagging for clinical text: wall or bridge between institutions?
AMIA Annu Symp Proc. 2011;2011:382-91. Epub 2011 Oct 22.
6
Natural language processing: an introduction.
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):544-51. doi: 10.1136/amiajnl-2011-000464.
7
The Yale cTAKES extensions for document classification: architecture and application.
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):614-20. doi: 10.1136/amiajnl-2011-000093. Epub 2011 May 27.
8
Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010.
J Am Med Inform Assoc. 2011 Sep-Oct;18(5):557-62. doi: 10.1136/amiajnl-2011-000150. Epub 2011 May 12.
10
Knowledge-based biomedical word sense disambiguation: comparison of approaches.
BMC Bioinformatics. 2010 Nov 22;11:569. doi: 10.1186/1471-2105-11-569.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验