• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

社交媒体中的阿拉伯方言识别:一种结合变压器模型和双向长短期记忆网络的混合模型。

Arabic dialect identification in social media: A hybrid model with transformer models and BiLSTM.

作者信息

Alsuwaylimi Amjad A

机构信息

Department of Information Technology, Faculty of Computing and Information Technology, Northern Border University, Rafha, 91911, Saudi Arabia.

出版信息

Heliyon. 2024 Aug 13;10(17):e36280. doi: 10.1016/j.heliyon.2024.e36280. eCollection 2024 Sep 15.

DOI:10.1016/j.heliyon.2024.e36280
PMID:39296033
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11408018/
Abstract

Arabic Dialect Identification (ADI) is a challenging task in natural language processing applications due to its diversity and regional variations. Despite previous efforts, this task is still difficult. Therefore, this study aims to use transformers to address the issue of ADI on social media. A combination of two hybrid models is proposed in this study: one that combines Bidirectional Long Short-Term Memory (BiLSTM) with CAMeLBERT, and the second model that combines the BiLSTM model with AlBERT. In addition, a novel dataset comprising 121,289 user-generated comments from various social media network platforms and four major Arabic dialects (Egyptian, Jordanian, Gulf and Yemeni) was introduced. Several experiments have been conducted using conventional Machine Learning Classifiers (MLCs) and Deep Learning Models (DLMs) as baselines to measure the performance and effectiveness of the proposed models. In addition, binary classification is performed between two dialects to determine which are closest to each other. The performance of the model is measured using common metrics such as precision, recall, F-score and F-measure. Experiment results demonstrate the superior efficiency of the proposed hybrid models in ADI, CAMeLBERT with BiLSTM and ALBERT with BiLSTM, which both recorded an accuracy of 87.67 % and 86.51 %, respectively.

摘要

阿拉伯方言识别(ADI)在自然语言处理应用中是一项具有挑战性的任务,因为其具有多样性和地域差异。尽管此前已做出诸多努力,但这项任务仍然困难重重。因此,本研究旨在使用Transformer来解决社交媒体上的阿拉伯方言识别问题。本研究提出了两种混合模型的组合:一种是将双向长短期记忆(BiLSTM)与CAMeLBERT相结合,另一种是将BiLSTM模型与阿尔伯特(ALBERT)相结合。此外,还引入了一个新颖的数据集,该数据集包含来自各种社交媒体网络平台的121,289条用户生成的评论以及四种主要阿拉伯方言(埃及语、约旦语、海湾阿拉伯语和也门语)。已使用传统机器学习分类器(MLC)和深度学习模型(DLM)作为基线进行了多项实验,以衡量所提出模型的性能和有效性。此外,还在两种方言之间进行了二元分类,以确定哪两种方言彼此最接近。使用精度、召回率、F分数和F测度等常见指标来衡量模型的性能。实验结果表明,所提出的混合模型在阿拉伯方言识别方面具有卓越的效率,即BiLSTM与CAMeLBERT以及BiLSTM与ALBERT的组合,其准确率分别达到了87.67%和86.51%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/212700f87966/gr14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/348f6721e472/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/1d94fd97f4f3/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/b9891748fbe4/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/081cc01b0b69/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/47e5d7e13ae9/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/a981fccc991f/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/db5682ef6b68/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/74320f65f00c/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/49777a62acd2/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/a212b84684ed/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/e719338032d0/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/afc95b075492/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/4d98ff08d83e/gr13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/212700f87966/gr14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/348f6721e472/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/1d94fd97f4f3/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/b9891748fbe4/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/081cc01b0b69/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/47e5d7e13ae9/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/a981fccc991f/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/db5682ef6b68/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/74320f65f00c/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/49777a62acd2/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/a212b84684ed/gr10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/e719338032d0/gr11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/afc95b075492/gr12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/4d98ff08d83e/gr13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14e6/11408018/212700f87966/gr14.jpg

相似文献

1
Arabic dialect identification in social media: A hybrid model with transformer models and BiLSTM.社交媒体中的阿拉伯方言识别:一种结合变压器模型和双向长短期记忆网络的混合模型。
Heliyon. 2024 Aug 13;10(17):e36280. doi: 10.1016/j.heliyon.2024.e36280. eCollection 2024 Sep 15.
2
DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learning.基于预训练模型和深度学习的阿拉伯文医学问题多标签分类模型:DeBERTa-BiLSTM
Comput Biol Med. 2024 Mar;170:107921. doi: 10.1016/j.compbiomed.2024.107921. Epub 2024 Jan 4.
3
Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications.基于转换器模型的罗曼 Urdu 仇恨言论检测在网络安全应用中的研究
Sensors (Basel). 2023 Apr 12;23(8):3909. doi: 10.3390/s23083909.
4
Pretrained Transformer Language Models Versus Pretrained Word Embeddings for the Detection of Accurate Health Information on Arabic Social Media: Comparative Study.用于在阿拉伯社交媒体上检测准确健康信息的预训练Transformer语言模型与预训练词嵌入:比较研究
JMIR Form Res. 2022 Jun 29;6(6):e34834. doi: 10.2196/34834.
5
Heterogeneous Ensemble Deep Learning Model for Enhanced Arabic Sentiment Analysis.用于增强阿拉伯语情感分析的异质集成深度学习模型。
Sensors (Basel). 2022 May 12;22(10):3707. doi: 10.3390/s22103707.
6
Hate speech detection with ADHAR: a multi-dialectal hate speech corpus in Arabic.使用ADHAR进行仇恨言论检测:一个阿拉伯语多方言仇恨言论语料库。
Front Artif Intell. 2024 May 30;7:1391472. doi: 10.3389/frai.2024.1391472. eCollection 2024.
7
Automatic symptoms identification from a massive volume of unstructured medical consultations using deep neural and BERT models.使用深度神经网络和BERT模型从大量非结构化医疗咨询中自动识别症状
Heliyon. 2022 Jun 10;8(6):e09683. doi: 10.1016/j.heliyon.2022.e09683. eCollection 2022 Jun.
8
Hate speech detection in the Arabic language: corpus design, construction, and evaluation.阿拉伯语中的仇恨言论检测:语料库设计、构建与评估。
Front Artif Intell. 2024 Feb 20;7:1345445. doi: 10.3389/frai.2024.1345445. eCollection 2024.
9
A transformer fine-tuning strategy for text dialect identification.一种用于文本方言识别的Transformer微调策略。
Neural Comput Appl. 2023;35(8):6115-6124. doi: 10.1007/s00521-022-07944-5. Epub 2022 Nov 15.
10
Detecting and Analyzing Suicidal Ideation on Social Media Using Deep Learning and Machine Learning Models.利用深度学习和机器学习模型检测和分析社交媒体上的自杀意念。
Int J Environ Res Public Health. 2022 Oct 3;19(19):12635. doi: 10.3390/ijerph191912635.

引用本文的文献

1
Advancing arabic dialect detection with hybrid stacked transformer models.使用混合堆叠变压器模型推进阿拉伯方言检测。
Front Hum Neurosci. 2025 Feb 11;19:1498297. doi: 10.3389/fnhum.2025.1498297. eCollection 2025.

本文引用的文献

1
An analysis of customer perception using lexicon-based sentiment analysis of Arabic Texts framework.使用基于词汇的阿拉伯语文本情感分析框架对客户感知进行分析。
Heliyon. 2024 May 1;10(11):e30320. doi: 10.1016/j.heliyon.2024.e30320. eCollection 2024 Jun 15.
2
Physicochemical properties-based hybrid machine learning technique for the prediction of SARS-CoV-2 T-cell epitopes as vaccine targets.基于物理化学性质的混合机器学习技术用于预测作为疫苗靶点的SARS-CoV-2 T细胞表位
PeerJ Comput Sci. 2024 Apr 25;10:e1980. doi: 10.7717/peerj-cs.1980. eCollection 2024.
3
A transformer fine-tuning strategy for text dialect identification.
一种用于文本方言识别的Transformer微调策略。
Neural Comput Appl. 2023;35(8):6115-6124. doi: 10.1007/s00521-022-07944-5. Epub 2022 Nov 15.
4
Decision tree based ensemble machine learning model for the prediction of Zika virus T-cell epitopes as potential vaccine candidates.基于决策树的集成机器学习模型用于预测寨卡病毒 T 细胞表位作为潜在疫苗候选物。
Sci Rep. 2022 May 12;12(1):7810. doi: 10.1038/s41598-022-11731-6.
5
A Novel Framework for Arabic Dialect Chatbot Using Machine Learning.基于机器学习的阿拉伯方言聊天机器人的新框架。
Comput Intell Neurosci. 2022 Mar 10;2022:1844051. doi: 10.1155/2022/1844051. eCollection 2022.
6
Emphatic variation of the labio-velar /w/ in two Jordanian Arabic dialects.约旦两种阿拉伯方言中唇软腭音/w/的强调变体
Heliyon. 2021 Oct 30;7(11):e08295. doi: 10.1016/j.heliyon.2021.e08295. eCollection 2021 Nov.
7
The roles of machine learning methods in limiting the spread of deadly diseases: A systematic review.机器学习方法在限制致命疾病传播中的作用:一项系统综述。
Heliyon. 2021 Jun;7(6):e07371. doi: 10.1016/j.heliyon.2021.e07371. Epub 2021 Jun 23.