基于注意力机制的 Bi-LSTM 和混合平衡技术在不平衡数据集上提高临床缩写词消歧

Department of Computer Engineering, Zand Institute of Higher Education, Shiraz, Iran.

Department of Computer Engineering, Marvdasht Branch, Islamic Azad University, Marvdasht, Iran.

J Eval Clin Pract. 2024 Oct;30(7):1327-1336. doi: 10.1111/jep.14041. Epub 2024 Jun 21.

RATIONALE

Clinical abbreviations pose a challenge for clinical decision support systems due to their ambiguity. Additionally, clinical datasets often suffer from class imbalance, hindering the classification of such data. This imbalance leads to classifiers with low accuracy and high error rates. Traditional feature-engineered models struggle with this task, and class imbalance is a known factor that reduces the performance of neural network techniques.

AIMS AND OBJECTIVES

This study proposes an attention-based bidirectional long short-term memory (Bi-LSTM) model to improve clinical abbreviation disambiguation in clinical documents. We aim to address the challenges of limited training data and class imbalance by employing data generation techniques like reverse substitution and data augmentation with synonym substitution.

METHOD

We utilise a Bi-LSTM classification model with an attention mechanism to disambiguate each abbreviation. The model's performance is evaluated based on accuracy for each abbreviation. To address the limitations of imbalanced data, we employ data generation techniques to create a more balanced dataset.

RESULTS

The evaluation results demonstrate that our data balancing technique significantly improves the model's accuracy by 2.08%. Furthermore, the proposed attention-based Bi-LSTM model achieves an accuracy of 96.09% on the UMN dataset, outperforming state-of-the-art results.

CONCLUSION

Deep neural network methods, particularly Bi-LSTM, offer promising alternatives to traditional feature-engineered models for clinical abbreviation disambiguation. By employing data generation techniques, we can address the challenges posed by limited-resource and imbalanced clinical datasets. This approach leads to a significant improvement in model accuracy for clinical abbreviation disambiguation tasks.

原理

由于临床缩写的模糊性，它们给临床决策支持系统带来了挑战。此外，临床数据集经常存在类别不平衡的问题，这阻碍了此类数据的分类。这种不平衡导致分类器的准确性较低，错误率较高。传统的基于特征工程的模型在这项任务上存在困难，而类别不平衡是降低神经网络技术性能的已知因素。

目的和目标

本研究提出了一种基于注意力的双向长短期记忆（Bi-LSTM）模型，以提高临床文档中临床缩写的歧义消解能力。我们旨在通过使用数据生成技术（如反向替换和同义词替换的数据增强）来解决训练数据有限和类别不平衡的挑战。

方法

我们使用具有注意力机制的 Bi-LSTM 分类模型来对每个缩写进行歧义消解。模型的性能根据每个缩写的准确性进行评估。为了解决不平衡数据的局限性，我们采用数据生成技术来创建一个更平衡的数据集。

结果

评估结果表明，我们的数据平衡技术将模型的准确性显著提高了 2.08%。此外，所提出的基于注意力的 Bi-LSTM 模型在 UMN 数据集上的准确率达到了 96.09%，优于最先进的结果。

结论

深度神经网络方法，特别是 Bi-LSTM，为临床缩写的歧义消解提供了有前途的替代传统基于特征工程的模型的方法。通过采用数据生成技术，我们可以解决资源有限和不平衡的临床数据集带来的挑战。这种方法可显著提高临床缩写歧义消解任务的模型准确性。

相似文献

Improving clinical abbreviation sense disambiguation using attention-based Bi-LSTM and hybrid balancing techniques in imbalanced datasets.

J Eval Clin Pract. 2024 Oct;30(7):1327-1336. doi: 10.1111/jep.14041. Epub 2024 Jun 21.

Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study.

JMIR Med Inform. 2024 Oct 1;12:e56955. doi: 10.2196/56955.

A convolutional route to abbreviation disambiguation in clinical text.

J Biomed Inform. 2018 Oct;86:71-78. doi: 10.1016/j.jbi.2018.07.025. Epub 2018 Aug 15.

Leveraging Large Language Models for Clinical Abbreviation Disambiguation.

J Med Syst. 2024 Feb 27;48(1):27. doi: 10.1007/s10916-024-02049-z.

A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.

Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.

Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks.

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):502. doi: 10.1186/s12859-019-3079-8.

Disambiguating Clinical Abbreviations Using a One-Fits-All Classifier Based on Deep Learning Techniques.

Methods Inf Med. 2022 Jun;61(S 01):e28-e34. doi: 10.1055/s-0042-1742388. Epub 2022 Feb 1.

A hybrid feature weighted attention based deep learning approach for an intrusion detection system using the random forest algorithm.

PLoS One. 2024 May 23;19(5):e0302294. doi: 10.1371/journal.pone.0302294. eCollection 2024.

A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD).

J Am Med Inform Assoc. 2017 Apr 1;24(e1):e79-e86. doi: 10.1093/jamia/ocw109.

Link-topic model for biomedical abbreviation disambiguation.

J Biomed Inform. 2015 Feb;53:367-80. doi: 10.1016/j.jbi.2014.12.013. Epub 2014 Dec 30.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Improving clinical abbreviation sense disambiguation using attention-based Bi-LSTM and hybrid balancing techniques in imbalanced datasets.

J Eval Clin Pract. 2024 Oct;30(7):1327-1336. doi: 10.1111/jep.14041. Epub 2024 Jun 21.

Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study.

JMIR Med Inform. 2024 Oct 1;12:e56955. doi: 10.2196/56955.

A convolutional route to abbreviation disambiguation in clinical text.

J Biomed Inform. 2018 Oct;86:71-78. doi: 10.1016/j.jbi.2018.07.025. Epub 2018 Aug 15.

Leveraging Large Language Models for Clinical Abbreviation Disambiguation.

J Med Syst. 2024 Feb 27;48(1):27. doi: 10.1007/s10916-024-02049-z.

A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.

Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.

Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks.

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):502. doi: 10.1186/s12859-019-3079-8.

Disambiguating Clinical Abbreviations Using a One-Fits-All Classifier Based on Deep Learning Techniques.

Methods Inf Med. 2022 Jun;61(S 01):e28-e34. doi: 10.1055/s-0042-1742388. Epub 2022 Feb 1.

A hybrid feature weighted attention based deep learning approach for an intrusion detection system using the random forest algorithm.

PLoS One. 2024 May 23;19(5):e0302294. doi: 10.1371/journal.pone.0302294. eCollection 2024.

A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD).

J Am Med Inform Assoc. 2017 Apr 1;24(e1):e79-e86. doi: 10.1093/jamia/ocw109.

Link-topic model for biomedical abbreviation disambiguation.

J Biomed Inform. 2015 Feb;53:367-80. doi: 10.1016/j.jbi.2014.12.013. Epub 2014 Dec 30.

Improving clinical abbreviation sense disambiguation using attention-based Bi-LSTM and hybrid balancing techniques in imbalanced datasets.

机构信息

出版信息

RATIONALE

AIMS AND OBJECTIVES

METHOD

RESULTS

CONCLUSION

原理

目的和目标

方法

结果

结论

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献