Applied Computing Department, Palestine Technical University - Kadoorie, Tulkarem, Palestine.
Department of Computer Science, Universidad Carlos III de Madrid, Leganés, Spain.
Methods Inf Med. 2022 Jun;61(S 01):e28-e34. doi: 10.1055/s-0042-1742388. Epub 2022 Feb 1.
Abbreviations are considered an essential part of the clinical narrative; they are used not only to save time and space but also to hide serious or incurable illnesses. Misreckoning interpretation of the clinical abbreviations could affect different aspects concerning patients themselves or other services like clinical support systems. There is no consensus in the scientific community to create new abbreviations, making it difficult to understand them. Disambiguate clinical abbreviations aim to predict the exact meaning of the abbreviation based on context, a crucial step in understanding clinical notes.
Disambiguating clinical abbreviations is an essential task in information extraction from medical texts. Deep contextualized representations models showed promising results in most word sense disambiguation tasks. In this work, we propose a one-fits-all classifier to disambiguate clinical abbreviations with deep contextualized representation from pretrained language models like Bidirectional Encoder Representation from Transformers (BERT).
A set of experiments with different pretrained clinical BERT models were performed to investigate fine-tuning methods on the disambiguation of clinical abbreviations. One-fits-all classifiers were used to improve disambiguating rare clinical abbreviations.
One-fits-all classifiers with deep contextualized representations from Bioclinical, BlueBERT, and MS_BERT pretrained models improved the accuracy using the University of Minnesota data set. The model achieved 98.99, 98.75, and 99.13%, respectively. All the models outperform the state-of-the-art in the previous work of around 98.39%, with the best accuracy using the MS_BERT model.
Deep contextualized representations via fine-tuning of pretrained language modeling proved its sufficiency on disambiguating clinical abbreviations; it could be robust for rare and unseen abbreviations and has the advantage of avoiding building a separate classifier for each abbreviation. Transfer learning can improve the development of practical abbreviation disambiguation systems.
缩写被认为是临床叙述的重要组成部分;它们不仅用于节省时间和空间,还用于隐藏严重或无法治愈的疾病。错误解读临床缩写可能会影响到与患者自身或其他服务相关的不同方面,例如临床支持系统。科学界对于创建新的缩写没有达成共识,这使得理解它们变得困难。临床缩写消歧旨在根据上下文预测缩写的确切含义,这是理解临床记录的关键步骤。
在从医学文本中提取信息时,临床缩写的消歧是一项重要任务。深度上下文表示模型在大多数词义消歧任务中都取得了有希望的结果。在这项工作中,我们提出了一种适用于所有情况的分类器,该分类器使用来自预训练语言模型(如 Transformer 双向编码器表示(Bidirectional Encoder Representation from Transformers,BERT))的深度上下文表示来对临床缩写进行消歧。
对不同的预训练临床 BERT 模型进行了一系列实验,以研究微调方法在临床缩写消歧方面的效果。使用适用于所有情况的分类器来提高对罕见临床缩写的消歧能力。
使用 Bioclinical、BlueBERT 和 MS_BERT 预训练模型的深度上下文表示的适用于所有情况的分类器,使用明尼苏达大学数据集提高了准确性。模型的准确率分别为 98.99%、98.75%和 99.13%。所有模型的表现都优于之前工作的 98.39%左右,使用 MS_BERT 模型的准确率最高。
通过预训练语言模型的微调获得的深度上下文表示证明了其在临床缩写消歧方面的有效性;它可以稳健地处理罕见和未见过的缩写,并且具有避免为每个缩写构建单独分类器的优势。迁移学习可以提高实用缩写消歧系统的开发。