Division of Neurology, Department of Internal Medicine, Ditmanson Medical Foundation Chia-Yi Christian Hospital, Chiayi City, Taiwan.
Department of Nursing, Fooyin University, Kaohsiung, Taiwan.
JMIR Med Inform. 2024 Oct 1;12:e56955. doi: 10.2196/56955.
Electronic medical records store extensive patient data and serve as a comprehensive repository, including textual medical records like surgical and imaging reports. Their utility in clinical decision support systems is substantial, but the widespread use of ambiguous and unstandardized abbreviations in clinical documents poses challenges for natural language processing in clinical decision support systems. Efficient abbreviation disambiguation methods are needed for effective information extraction.
This study aims to enhance the one-to-all (OTA) framework for clinical abbreviation expansion, which uses a single model to predict multiple abbreviation meanings. The objective is to improve OTA by developing context-candidate pairs and optimizing word embeddings in Bidirectional Encoder Representations From Transformers (BERT), evaluating the model's efficacy in expanding clinical abbreviations using real data.
Three datasets were used: Medical Subject Headings Word Sense Disambiguation, University of Minnesota, and Chia-Yi Christian Hospital from Ditmanson Medical Foundation Chia-Yi Christian Hospital. Texts containing polysemous abbreviations were preprocessed and formatted for BERT. The study involved fine-tuning pretrained models, ClinicalBERT and BlueBERT, generating dataset pairs for training and testing based on Huang et al's method.
BlueBERT achieved macro- and microaccuracies of 95.41% and 95.16%, respectively, on the Medical Subject Headings Word Sense Disambiguation dataset. It improved macroaccuracy by 0.54%-1.53% compared to two baselines, long short-term memory and deepBioWSD with random embedding. On the University of Minnesota dataset, BlueBERT recorded macro- and microaccuracies of 98.40% and 98.22%, respectively. Against the baselines of Word2Vec + support vector machine and BioWordVec + support vector machine, BlueBERT demonstrated a macroaccuracy improvement of 2.61%-4.13%.
This research preliminarily validated the effectiveness of the OTA method for abbreviation disambiguation in medical texts, demonstrating the potential to enhance both clinical staff efficiency and research effectiveness.
电子病历存储了大量患者数据,是一个综合的知识库,其中包括手术和影像报告等文本医疗记录。它们在临床决策支持系统中的实用性很大,但临床文档中广泛使用模糊和非标准化的缩写给临床决策支持系统中的自然语言处理带来了挑战。需要有效的缩写词消歧方法来进行有效的信息提取。
本研究旨在增强用于临床缩写扩展的一对一到所有(OTA)框架,该框架使用单个模型来预测多个缩写含义。目标是通过开发上下文-候选对并优化来自变压器的双向编码器表示(BERT)中的单词嵌入,使用真实数据评估模型在扩展临床缩写方面的效果,从而改进 OTA。
使用了三个数据集:医学主题词词义消歧、明尼苏达大学和基督教医科大学(由基督教医科大学基金会管理)。含有多义词缩写的文本经过预处理和 BERT 格式化。研究涉及微调预训练模型 ClinicalBERT 和 BlueBERT,根据 Huang 等人的方法生成训练和测试数据集对。
BlueBERT 在医学主题词词义消歧数据集上的宏准确率和微准确率分别达到 95.41%和 95.16%。与两个基线(长短期记忆和带有随机嵌入的 deepBioWSD)相比,它的宏准确率提高了 0.54%-1.53%。在明尼苏达大学数据集上,BlueBERT 的宏准确率和微准确率分别达到 98.40%和 98.22%。与 Word2Vec + 支持向量机和 BioWordVec + 支持向量机基线相比,BlueBERT 的宏准确率提高了 2.61%-4.13%。
本研究初步验证了 OTA 方法在医学文本缩写消歧中的有效性,表明该方法有可能提高临床工作人员的效率和研究效果。