Suppr超能文献

基于注意力机制的 Bi-LSTM 和混合平衡技术在不平衡数据集上提高临床缩写词消歧

Improving clinical abbreviation sense disambiguation using attention-based Bi-LSTM and hybrid balancing techniques in imbalanced datasets.

机构信息

Department of Computer Engineering, Zand Institute of Higher Education, Shiraz, Iran.

Department of Computer Engineering, Marvdasht Branch, Islamic Azad University, Marvdasht, Iran.

出版信息

J Eval Clin Pract. 2024 Oct;30(7):1327-1336. doi: 10.1111/jep.14041. Epub 2024 Jun 21.

Abstract

RATIONALE

Clinical abbreviations pose a challenge for clinical decision support systems due to their ambiguity. Additionally, clinical datasets often suffer from class imbalance, hindering the classification of such data. This imbalance leads to classifiers with low accuracy and high error rates. Traditional feature-engineered models struggle with this task, and class imbalance is a known factor that reduces the performance of neural network techniques.

AIMS AND OBJECTIVES

This study proposes an attention-based bidirectional long short-term memory (Bi-LSTM) model to improve clinical abbreviation disambiguation in clinical documents. We aim to address the challenges of limited training data and class imbalance by employing data generation techniques like reverse substitution and data augmentation with synonym substitution.

METHOD

We utilise a Bi-LSTM classification model with an attention mechanism to disambiguate each abbreviation. The model's performance is evaluated based on accuracy for each abbreviation. To address the limitations of imbalanced data, we employ data generation techniques to create a more balanced dataset.

RESULTS

The evaluation results demonstrate that our data balancing technique significantly improves the model's accuracy by 2.08%. Furthermore, the proposed attention-based Bi-LSTM model achieves an accuracy of 96.09% on the UMN dataset, outperforming state-of-the-art results.

CONCLUSION

Deep neural network methods, particularly Bi-LSTM, offer promising alternatives to traditional feature-engineered models for clinical abbreviation disambiguation. By employing data generation techniques, we can address the challenges posed by limited-resource and imbalanced clinical datasets. This approach leads to a significant improvement in model accuracy for clinical abbreviation disambiguation tasks.

摘要

原理

由于临床缩写的模糊性,它们给临床决策支持系统带来了挑战。此外,临床数据集经常存在类别不平衡的问题,这阻碍了此类数据的分类。这种不平衡导致分类器的准确性较低,错误率较高。传统的基于特征工程的模型在这项任务上存在困难,而类别不平衡是降低神经网络技术性能的已知因素。

目的和目标

本研究提出了一种基于注意力的双向长短期记忆(Bi-LSTM)模型,以提高临床文档中临床缩写的歧义消解能力。我们旨在通过使用数据生成技术(如反向替换和同义词替换的数据增强)来解决训练数据有限和类别不平衡的挑战。

方法

我们使用具有注意力机制的 Bi-LSTM 分类模型来对每个缩写进行歧义消解。模型的性能根据每个缩写的准确性进行评估。为了解决不平衡数据的局限性,我们采用数据生成技术来创建一个更平衡的数据集。

结果

评估结果表明,我们的数据平衡技术将模型的准确性显著提高了 2.08%。此外,所提出的基于注意力的 Bi-LSTM 模型在 UMN 数据集上的准确率达到了 96.09%,优于最先进的结果。

结论

深度神经网络方法,特别是 Bi-LSTM,为临床缩写的歧义消解提供了有前途的替代传统基于特征工程的模型的方法。通过采用数据生成技术,我们可以解决资源有限和不平衡的临床数据集带来的挑战。这种方法可显著提高临床缩写歧义消解任务的模型准确性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验