临床文本中首字母缩略词和缩写词的自动消歧：窗口与训练规模考量

Automated disambiguation of acronyms and abbreviations in clinical texts: window and training size considerations.

作者信息

Moon Sungrim, Pakhomov Serguei, Melton Genevieve B

机构信息

Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA.

出版信息

AMIA Annu Symp Proc. 2012;2012:1310-9. Epub 2012 Nov 3.

PMID:23304410

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3540435/

Abstract

Acronyms and abbreviations within electronic clinical texts are widespread and often associated with multiple senses. Automated acronym sense disambiguation (WSD), a task of assigning the context-appropriate sense to ambiguous clinical acronyms and abbreviations, represents an active problem for medical natural language processing (NLP) systems. In this paper, fifty clinical acronyms and abbreviations with 500 samples each were studied using supervised machine-learning techniques (Support Vector Machines (SVM), Naïve Bayes (NB), and Decision Trees (DT)) to optimize the window size and orientation and determine the minimum training sample size needed for optimal performance. Our analysis of window size and orientation showed best performance using a larger left-sided and smaller right-sided window. To achieve an accuracy of over 90%, the minimum required training sample size was approximately 125 samples for SVM classifiers with inverted cross-validation. These findings support future work in clinical acronym and abbreviation WSD and require validation with other clinical texts.

摘要

电子临床文本中的首字母缩略词和缩写广泛存在，且常常具有多种含义。自动首字母缩略词词义消歧（WSD），即给模糊的临床首字母缩略词和缩写赋予适合上下文的含义这一任务，是医学自然语言处理（NLP）系统面临的一个实际问题。在本文中，我们使用监督机器学习技术（支持向量机（SVM）、朴素贝叶斯（NB）和决策树（DT））对五十个临床首字母缩略词和缩写进行了研究，每个词有500个样本，以优化窗口大小和方向，并确定实现最佳性能所需的最小训练样本量。我们对窗口大小和方向的分析表明，使用较大的左侧窗口和较小的右侧窗口性能最佳。对于采用反向交叉验证的SVM分类器，要达到90%以上的准确率，所需的最小训练样本量约为125个样本。这些发现为临床首字母缩略词和缩写WSD的未来研究提供了支持，并且需要用其他临床文本进行验证。

相似文献

Automated disambiguation of acronyms and abbreviations in clinical texts: window and training size considerations.

AMIA Annu Symp Proc. 2012;2012:1310-9. Epub 2012 Nov 3.

Automated non-alphanumeric symbol resolution in clinical texts.

AMIA Annu Symp Proc. 2011;2011:979-86. Epub 2011 Oct 22.

Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data.

AMIA Annu Symp Proc. 2017 Feb 10;2016:560-569. eCollection 2016.

Word Sense Disambiguation of clinical abbreviations with hyperdimensional computing.

AMIA Annu Symp Proc. 2013 Nov 16;2013:1007-16. eCollection 2013.

Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues.

BMC Bioinformatics. 2006 Jul 5;7:334. doi: 10.1186/1471-2105-7-334.

A comparative study of supervised learning as applied to acronym expansion in clinical reports.

AMIA Annu Symp Proc. 2006;2006:399-403.

A multi-aspect comparison study of supervised word sense disambiguation.

J Am Med Inform Assoc. 2004 Jul-Aug;11(4):320-31. doi: 10.1197/jamia.M1533. Epub 2004 Apr 2.

A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.

Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.

Abbreviation and acronym disambiguation in clinical discourse.

AMIA Annu Symp Proc. 2005;2005:589-93.

A deep database of medical abbreviations and acronyms for natural language processing.

Sci Data. 2021 Jun 2;8(1):149. doi: 10.1038/s41597-021-00929-4.

引用本文的文献

Deciphering Abbreviations in Malaysian Clinical Notes Using Machine Learning.

Methods Inf Med. 2024 Dec;63(5-06):195-202. doi: 10.1055/a-2521-4372. Epub 2025 Jan 22.

Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study.

JMIR Med Inform. 2024 Oct 1;12:e56955. doi: 10.2196/56955.

Exploring Large Language Models for Acronym, Symbol Sense Disambiguation, and Semantic Similarity and Relatedness Assessment.

AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:324-333. eCollection 2024.

Sequence Labeling for Disambiguating Medical Abbreviations.

J Healthc Inform Res. 2023 Sep 14;7(4):501-526. doi: 10.1007/s41666-023-00146-1. eCollection 2023 Dec.

A case study in applying artificial intelligence-based named entity recognition to develop an automated ophthalmic disease registry.

Graefes Arch Clin Exp Ophthalmol. 2023 Nov;261(11):3335-3344. doi: 10.1007/s00417-023-06190-2. Epub 2023 Aug 3.

Deciphering clinical abbreviations with a privacy protecting machine learning system.

Nat Commun. 2022 Dec 2;13(1):7456. doi: 10.1038/s41467-022-35007-9.

Disambiguating Clinical Abbreviations Using a One-Fits-All Classifier Based on Deep Learning Techniques.

Methods Inf Med. 2022 Jun;61(S 01):e28-e34. doi: 10.1055/s-0042-1742388. Epub 2022 Feb 1.

Automatically disambiguating medical acronyms with ontology-aware deep learning.

Nat Commun. 2021 Sep 7;12(1):5319. doi: 10.1038/s41467-021-25578-4.

The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records.

J Am Med Inform Assoc. 2020 Oct 1;27(10):1529-1537. doi: 10.1093/jamia/ocaa106.

Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis.

Future Gener Comput Syst. 2019 Sep;98:238-251. doi: 10.1016/j.future.2019.01.016. Epub 2019 Mar 28.

本文引用的文献

Systematized nomenclature of medicine clinical terms (SNOMED CT) to represent computed tomography procedures.

Comput Methods Programs Biomed. 2011 Mar;101(3):324-9. doi: 10.1016/j.cmpb.2011.01.002.

UMLS-Interface and UMLS-Similarity : open source software for measuring paths and semantic similarity.

AMIA Annu Symp Proc. 2009 Nov 14;2009:431-5.

Evaluation of a method to identify and categorize section headers in clinical documents.

J Am Med Inform Assoc. 2009 Nov-Dec;16(6):806-15. doi: 10.1197/jamia.M3037. Epub 2009 Aug 28.

Methods for building sense inventories of abbreviations in clinical notes.

J Am Med Inform Assoc. 2009 Jan-Feb;16(1):103-8. doi: 10.1197/jamia.M2927. Epub 2008 Oct 24.

A study of abbreviations in clinical notes.

AMIA Annu Symp Proc. 2007 Oct 11;2007:821-5.

Using UMLS Concept Unique Identifiers (CUIs) for word sense disambiguation in the biomedical domain.

AMIA Annu Symp Proc. 2007 Oct 11;2007:533-7.

Word sense disambiguation across two domains: biomedical literature and clinical notes.

J Biomed Inform. 2008 Dec;41(6):1088-100. doi: 10.1016/j.jbi.2008.02.003. Epub 2008 Mar 4.

A comparative study of supervised learning as applied to acronym expansion in clinical reports.

AMIA Annu Symp Proc. 2006;2006:399-403.

Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues.

BMC Bioinformatics. 2006 Jul 5;7:334. doi: 10.1186/1471-2105-7-334.

Abbreviation and acronym disambiguation in clinical discourse.

AMIA Annu Symp Proc. 2005;2005:589-93.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

临床文本中首字母缩略词和缩写词的自动消歧：窗口与训练规模考量

Automated disambiguation of acronyms and abbreviations in clinical texts: window and training size considerations.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献