Suppr超能文献

临床文本中首字母缩略词和缩写词的自动消歧:窗口与训练规模考量

Automated disambiguation of acronyms and abbreviations in clinical texts: window and training size considerations.

作者信息

Moon Sungrim, Pakhomov Serguei, Melton Genevieve B

机构信息

Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA.

出版信息

AMIA Annu Symp Proc. 2012;2012:1310-9. Epub 2012 Nov 3.

Abstract

Acronyms and abbreviations within electronic clinical texts are widespread and often associated with multiple senses. Automated acronym sense disambiguation (WSD), a task of assigning the context-appropriate sense to ambiguous clinical acronyms and abbreviations, represents an active problem for medical natural language processing (NLP) systems. In this paper, fifty clinical acronyms and abbreviations with 500 samples each were studied using supervised machine-learning techniques (Support Vector Machines (SVM), Naïve Bayes (NB), and Decision Trees (DT)) to optimize the window size and orientation and determine the minimum training sample size needed for optimal performance. Our analysis of window size and orientation showed best performance using a larger left-sided and smaller right-sided window. To achieve an accuracy of over 90%, the minimum required training sample size was approximately 125 samples for SVM classifiers with inverted cross-validation. These findings support future work in clinical acronym and abbreviation WSD and require validation with other clinical texts.

摘要

电子临床文本中的首字母缩略词和缩写广泛存在,且常常具有多种含义。自动首字母缩略词词义消歧(WSD),即给模糊的临床首字母缩略词和缩写赋予适合上下文的含义这一任务,是医学自然语言处理(NLP)系统面临的一个实际问题。在本文中,我们使用监督机器学习技术(支持向量机(SVM)、朴素贝叶斯(NB)和决策树(DT))对五十个临床首字母缩略词和缩写进行了研究,每个词有500个样本,以优化窗口大小和方向,并确定实现最佳性能所需的最小训练样本量。我们对窗口大小和方向的分析表明,使用较大的左侧窗口和较小的右侧窗口性能最佳。对于采用反向交叉验证的SVM分类器,要达到90%以上的准确率,所需的最小训练样本量约为125个样本。这些发现为临床首字母缩略词和缩写WSD的未来研究提供了支持,并且需要用其他临床文本进行验证。

相似文献

2
Automated non-alphanumeric symbol resolution in clinical texts.
AMIA Annu Symp Proc. 2011;2011:979-86. Epub 2011 Oct 22.
3
Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data.
AMIA Annu Symp Proc. 2017 Feb 10;2016:560-569. eCollection 2016.
4
Word Sense Disambiguation of clinical abbreviations with hyperdimensional computing.
AMIA Annu Symp Proc. 2013 Nov 16;2013:1007-16. eCollection 2013.
5
7
A multi-aspect comparison study of supervised word sense disambiguation.
J Am Med Inform Assoc. 2004 Jul-Aug;11(4):320-31. doi: 10.1197/jamia.M1533. Epub 2004 Apr 2.
8
A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.
Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.
10
A deep database of medical abbreviations and acronyms for natural language processing.
Sci Data. 2021 Jun 2;8(1):149. doi: 10.1038/s41597-021-00929-4.

引用本文的文献

1
Deciphering Abbreviations in Malaysian Clinical Notes Using Machine Learning.
Methods Inf Med. 2024 Dec;63(5-06):195-202. doi: 10.1055/a-2521-4372. Epub 2025 Jan 22.
4
Sequence Labeling for Disambiguating Medical Abbreviations.
J Healthc Inform Res. 2023 Sep 14;7(4):501-526. doi: 10.1007/s41666-023-00146-1. eCollection 2023 Dec.
5
A case study in applying artificial intelligence-based named entity recognition to develop an automated ophthalmic disease registry.
Graefes Arch Clin Exp Ophthalmol. 2023 Nov;261(11):3335-3344. doi: 10.1007/s00417-023-06190-2. Epub 2023 Aug 3.
6
Deciphering clinical abbreviations with a privacy protecting machine learning system.
Nat Commun. 2022 Dec 2;13(1):7456. doi: 10.1038/s41467-022-35007-9.
7
Disambiguating Clinical Abbreviations Using a One-Fits-All Classifier Based on Deep Learning Techniques.
Methods Inf Med. 2022 Jun;61(S 01):e28-e34. doi: 10.1055/s-0042-1742388. Epub 2022 Feb 1.
8
Automatically disambiguating medical acronyms with ontology-aware deep learning.
Nat Commun. 2021 Sep 7;12(1):5319. doi: 10.1038/s41467-021-25578-4.
10
Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis.
Future Gener Comput Syst. 2019 Sep;98:238-251. doi: 10.1016/j.future.2019.01.016. Epub 2019 Mar 28.

本文引用的文献

1
Systematized nomenclature of medicine clinical terms (SNOMED CT) to represent computed tomography procedures.
Comput Methods Programs Biomed. 2011 Mar;101(3):324-9. doi: 10.1016/j.cmpb.2011.01.002.
3
Evaluation of a method to identify and categorize section headers in clinical documents.
J Am Med Inform Assoc. 2009 Nov-Dec;16(6):806-15. doi: 10.1197/jamia.M3037. Epub 2009 Aug 28.
4
Methods for building sense inventories of abbreviations in clinical notes.
J Am Med Inform Assoc. 2009 Jan-Feb;16(1):103-8. doi: 10.1197/jamia.M2927. Epub 2008 Oct 24.
5
A study of abbreviations in clinical notes.
AMIA Annu Symp Proc. 2007 Oct 11;2007:821-5.
7
Word sense disambiguation across two domains: biomedical literature and clinical notes.
J Biomed Inform. 2008 Dec;41(6):1088-100. doi: 10.1016/j.jbi.2008.02.003. Epub 2008 Mar 4.
9

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验