Suppr超能文献

基于预训练模型和深度学习的阿拉伯文医学问题多标签分类模型:DeBERTa-BiLSTM

DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learning.

机构信息

King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.

出版信息

Comput Biol Med. 2024 Mar;170:107921. doi: 10.1016/j.compbiomed.2024.107921. Epub 2024 Jan 4.

Abstract

It is wise to investigate past and present epidemics in the hopes of profiting from them and being better prepared for future ones. COVID-19 is one of the most recent and well-known pandemics; its effects are still felt today. Most or nearly all governments have announced various measures to combat the virus, making it challenging to keep people aware of the most up-to-date and relevant information. As a result, many websites have created and maintained Frequently Asked Questions (FAQs) regarding the pandemic. People naturally tend to ask about multiple points in one question, leading to multi-label questions. Multi-label questions classification is one of Natural Language Processing's (NLP) most common and complicated tasks. One of classification's most significant contributions to advancing medical care and facilities is the development of automated question-and-answer systems. These systems can improve the efficiency of healthcare by reducing the burden on healthcare professionals and providing patients with timely and reliable answers to their questions. Due to the Arabic language's intricate morphology and structure, such a task becomes more challenging when dealing with Arabic text. This study aims to build a multi-label classification model for Arabic medical questions. The investigation of pre-trained neural models significantly improved NLP performance. Recently, pre-trained models have been used in multi-label classification. This study proposes a deep learning model for classifying Arabic multi-label COVID-19 questions by combining the strengths of DeBERTa (Decoding-enhanced BERT with Disentangled Attention) and BiLSTM (Bidirectional Long Short-Term Memory) networks. Deep learning methods are prevalent because they generate dense feature representations automatically and implicitly capture hidden relationships. The DeBERTa model is fine-tuned to generate the representation of word vectors. The BiLSTM model is fed word vectors to extract and represent features deeply. The proposed multi-label classification model categorizes questions into one or more available ten categories. The deep learning model is evaluated using hamming loss, micro-precision, micro-recall, micro-F1, subset accuracy, AUC, and Jaccard index. It showed an effective classification for Arabic questions with encouraging performance. The proposed model achieved values of 0.042 for hamming loss, 0.84 for micro-precision, micro-recall, and micro-F1, 0.71 for subset accuracy, 0.89 for AUC, and 0.72 for Jaccard index. Therefore, this paves the way for adopting an automated multi-label classification model for medical questions in health facilities. Which can help telehealth medical providers present more reliable and effective consultations.

摘要

研究过去和现在的流行病,从中吸取经验教训,为未来的流行病做好更好的准备,这是明智的。COVID-19 是最近和最著名的大流行病之一;其影响至今仍在。大多数或几乎所有政府都宣布了各种措施来对抗病毒,这使得人们很难掌握最新和最相关的信息。因此,许多网站创建并维护了有关该大流行的常见问题解答(FAQ)。人们自然倾向于在一个问题中询问多个点,从而导致多标签问题。多标签问题分类是自然语言处理(NLP)中最常见和最复杂的任务之一。分类对推进医疗保健和设施的最大贡献之一是开发自动化问答系统。这些系统可以通过减轻医疗保健专业人员的负担并为患者提供及时可靠的问题答案来提高医疗保健的效率。由于阿拉伯语复杂的形态和结构,处理阿拉伯语文本时,此类任务变得更加具有挑战性。本研究旨在构建用于阿拉伯文医学问题的多标签分类模型。对预先训练的神经模型的研究极大地提高了 NLP 的性能。最近,预先训练的模型已用于多标签分类。本研究提出了一种通过结合 DeBERTa(带解缠注意力的解码增强 BERT)和 BiLSTM(双向长短期记忆)网络的优势来对阿拉伯文多标签 COVID-19 问题进行分类的深度学习模型。深度学习方法很流行,因为它们可以自动生成密集的特征表示,并隐式地捕获隐藏的关系。对 DeBERTa 模型进行微调以生成单词向量的表示。将 BiLSTM 模型提供给单词向量,以深入提取和表示特征。所提出的多标签分类模型将问题分为一个或多个可用的十个类别。使用汉明损失、微精度、微召回、微 F1、子集精度、AUC 和 Jaccard 指数评估深度学习模型。结果表明,该模型对阿拉伯语问题的分类效果令人鼓舞。该模型在汉明损失方面的得分为 0.042,在微精度、微召回和微 F1 方面的得分为 0.84,在子集精度方面的得分为 0.71,在 AUC 方面的得分为 0.89,在 Jaccard 指数方面的得分为 0.72。因此,这为在医疗机构中采用用于医学问题的自动化多标签分类模型铺平了道路。这可以帮助远程医疗医疗提供者提供更可靠和有效的咨询。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验