通过一对一分类法对临床缩写进行消歧：算法开发和验证研究。

Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study.

机构信息

Division of Neurology, Department of Internal Medicine, Ditmanson Medical Foundation Chia-Yi Christian Hospital, Chiayi City, Taiwan.

Department of Nursing, Fooyin University, Kaohsiung, Taiwan.

出版信息

JMIR Med Inform. 2024 Oct 1;12:e56955. doi: 10.2196/56955.

DOI:10.2196/56955

PMID:39352715

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11460304/

Abstract

BACKGROUND

Electronic medical records store extensive patient data and serve as a comprehensive repository, including textual medical records like surgical and imaging reports. Their utility in clinical decision support systems is substantial, but the widespread use of ambiguous and unstandardized abbreviations in clinical documents poses challenges for natural language processing in clinical decision support systems. Efficient abbreviation disambiguation methods are needed for effective information extraction.

OBJECTIVE

This study aims to enhance the one-to-all (OTA) framework for clinical abbreviation expansion, which uses a single model to predict multiple abbreviation meanings. The objective is to improve OTA by developing context-candidate pairs and optimizing word embeddings in Bidirectional Encoder Representations From Transformers (BERT), evaluating the model's efficacy in expanding clinical abbreviations using real data.

METHODS

Three datasets were used: Medical Subject Headings Word Sense Disambiguation, University of Minnesota, and Chia-Yi Christian Hospital from Ditmanson Medical Foundation Chia-Yi Christian Hospital. Texts containing polysemous abbreviations were preprocessed and formatted for BERT. The study involved fine-tuning pretrained models, ClinicalBERT and BlueBERT, generating dataset pairs for training and testing based on Huang et al's method.

RESULTS

BlueBERT achieved macro- and microaccuracies of 95.41% and 95.16%, respectively, on the Medical Subject Headings Word Sense Disambiguation dataset. It improved macroaccuracy by 0.54%-1.53% compared to two baselines, long short-term memory and deepBioWSD with random embedding. On the University of Minnesota dataset, BlueBERT recorded macro- and microaccuracies of 98.40% and 98.22%, respectively. Against the baselines of Word2Vec + support vector machine and BioWordVec + support vector machine, BlueBERT demonstrated a macroaccuracy improvement of 2.61%-4.13%.

CONCLUSIONS

This research preliminarily validated the effectiveness of the OTA method for abbreviation disambiguation in medical texts, demonstrating the potential to enhance both clinical staff efficiency and research effectiveness.

摘要

背景

电子病历存储了大量患者数据，是一个综合的知识库，其中包括手术和影像报告等文本医疗记录。它们在临床决策支持系统中的实用性很大，但临床文档中广泛使用模糊和非标准化的缩写给临床决策支持系统中的自然语言处理带来了挑战。需要有效的缩写词消歧方法来进行有效的信息提取。

目的

本研究旨在增强用于临床缩写扩展的一对一到所有（OTA）框架，该框架使用单个模型来预测多个缩写含义。目标是通过开发上下文-候选对并优化来自变压器的双向编码器表示（BERT）中的单词嵌入，使用真实数据评估模型在扩展临床缩写方面的效果，从而改进 OTA。

方法

使用了三个数据集：医学主题词词义消歧、明尼苏达大学和基督教医科大学（由基督教医科大学基金会管理）。含有多义词缩写的文本经过预处理和 BERT 格式化。研究涉及微调预训练模型 ClinicalBERT 和 BlueBERT，根据 Huang 等人的方法生成训练和测试数据集对。

结果

BlueBERT 在医学主题词词义消歧数据集上的宏准确率和微准确率分别达到 95.41%和 95.16%。与两个基线（长短期记忆和带有随机嵌入的 deepBioWSD）相比，它的宏准确率提高了 0.54%-1.53%。在明尼苏达大学数据集上，BlueBERT 的宏准确率和微准确率分别达到 98.40%和 98.22%。与 Word2Vec + 支持向量机和 BioWordVec + 支持向量机基线相比，BlueBERT 的宏准确率提高了 2.61%-4.13%。

结论

本研究初步验证了 OTA 方法在医学文本缩写消歧中的有效性，表明该方法有可能提高临床工作人员的效率和研究效果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d12/11460304/2c863c6c22fd/medinform-v12-e56955-g001.jpg

相似文献

Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study.通过一对一分类法对临床缩写进行消歧：算法开发和验证研究。

JMIR Med Inform. 2024 Oct 1;12:e56955. doi: 10.2196/56955.

Disambiguating Clinical Abbreviations Using a One-Fits-All Classifier Based on Deep Learning Techniques.基于深度学习技术的一刀切分类器在临床缩写中的应用。

Methods Inf Med. 2022 Jun;61(S 01):e28-e34. doi: 10.1055/s-0042-1742388. Epub 2022 Feb 1.

A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD).从冗长表述到简短缩写的漫长历程：开发一个用于临床缩写识别与消歧的开源框架（CARD）

J Am Med Inform Assoc. 2017 Apr 1;24(e1):e79-e86. doi: 10.1093/jamia/ocw109.

Improving clinical abbreviation sense disambiguation using attention-based Bi-LSTM and hybrid balancing techniques in imbalanced datasets.基于注意力机制的 Bi-LSTM 和混合平衡技术在不平衡数据集上提高临床缩写词消歧

J Eval Clin Pract. 2024 Oct;30(7):1327-1336. doi: 10.1111/jep.14041. Epub 2024 Jun 21.

Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data.利用机器标注训练数据实现临床缩写词的全面消歧

AMIA Annu Symp Proc. 2017 Feb 10;2016:560-569. eCollection 2016.

A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.实时临床缩写词消歧的初步研究

Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.深度生物词汇语义消歧：生物医学文本数据的有效深度神经网络词汇语义消歧。

J Am Med Inform Assoc. 2019 May 1;26(5):438-446. doi: 10.1093/jamia/ocy189.

Leveraging Large Language Models for Clinical Abbreviation Disambiguation.利用大型语言模型进行临床缩写词消歧。

J Med Syst. 2024 Feb 27;48(1):27. doi: 10.1007/s10916-024-02049-z.

Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text.使用预训练语言模型和先进提示学习技术的自主国际疾病分类编码：对一个使用医学文本的自动分析系统的评估

JMIR Med Inform. 2025 Jan 6;13:e63020. doi: 10.2196/63020.

Link-topic model for biomedical abbreviation disambiguation.用于生物医学缩写词消歧的链接主题模型

J Biomed Inform. 2015 Feb;53:367-80. doi: 10.1016/j.jbi.2014.12.013. Epub 2014 Dec 30.

引用本文的文献

Using Natural Language Processing and Machine Learning to classify the status of kidney allograft in Electronic Medical Records written in Spanish.使用自然语言处理和机器学习对西班牙语电子病历中同种异体肾移植的状态进行分类。

PLoS One. 2025 May 8;20(5):e0322587. doi: 10.1371/journal.pone.0322587. eCollection 2025.

本文引用的文献

A hybrid system to understand the relations between assessments and plans in progress notes.一种混合系统，用于理解在进行中的记录中的评估和计划之间的关系。

J Biomed Inform. 2023 May;141:104363. doi: 10.1016/j.jbi.2023.104363. Epub 2023 Apr 11.

Leveraging unstructured electronic medical record notes to derive population-specific suicide risk models.利用非结构化的电子病历记录来推导出特定人群的自杀风险模型。

Psychiatry Res. 2022 Sep;315:114703. doi: 10.1016/j.psychres.2022.114703. Epub 2022 Jul 1.

Development and assessment of a natural language processing model to identify residential instability in electronic health records' unstructured data: a comparison of 3 integrated healthcare delivery systems.开发和评估一种用于识别电子健康记录非结构化数据中居住不稳定情况的自然语言处理模型：对3个综合医疗服务系统的比较

JAMIA Open. 2022 Feb 16;5(1):ooac006. doi: 10.1093/jamiaopen/ooac006. eCollection 2022 Apr.

Deep learning model for multi-classification of infectious diseases from unstructured electronic medical records.基于无结构电子病历的传染病多分类深度学习模型。

BMC Med Inform Decis Mak. 2022 Feb 16;22(1):41. doi: 10.1186/s12911-022-01776-y.

Natural Language Processing Enhances Prediction of Functional Outcome After Acute Ischemic Stroke.自然语言处理增强急性缺血性脑卒中后功能结局预测。

J Am Heart Assoc. 2021 Dec 21;10(24):e023486. doi: 10.1161/JAHA.121.023486. Epub 2021 Nov 19.

A deep database of medical abbreviations and acronyms for natural language processing.用于自然语言处理的医学缩写和首字母缩略词的深度数据库。

Sci Data. 2021 Jun 2;8(1):149. doi: 10.1038/s41597-021-00929-4.

Predicting mortality in critically ill patients with diabetes using machine learning and clinical notes.使用机器学习和临床记录预测危重症糖尿病患者的死亡率。

BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):295. doi: 10.1186/s12911-020-01318-4.

Combining structured and unstructured data for predictive models: a deep learning approach.将结构化和非结构化数据结合用于预测模型：一种深度学习方法。

BMC Med Inform Decis Mak. 2020 Oct 29;20(1):280. doi: 10.1186/s12911-020-01297-6.

EMR-Based Phenotyping of Ischemic Stroke Using Supervised Machine Learning and Text Mining Techniques.基于电子病历的缺血性脑卒中表型分析：监督机器学习和文本挖掘技术的应用

IEEE J Biomed Health Inform. 2020 Oct;24(10):2922-2931. doi: 10.1109/JBHI.2020.2976931. Epub 2020 Feb 28.

BioWordVec, improving biomedical word embeddings with subword information and MeSH.BioWordVec，利用子词信息和 MeSH 改进生物医学词向量。

Sci Data. 2019 May 10;6(1):52. doi: 10.1038/s41597-019-0055-0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过一对一分类法对临床缩写进行消歧：算法开发和验证研究。

Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献