• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于利用子词单元的阿拉伯方言的基于转换器的神经机器翻译模型。

A Transformer-Based Neural Machine Translation Model for Arabic Dialects That Utilizes Subword Units.

机构信息

School of Computer Science and Engineering, Kyungpook National University, 80 Daehak-ro, Buk-gu, Daegu 41566, Korea.

Department of Computer Science, Durham University, Stockton Road, Durham DH1 3LE, UK.

出版信息

Sensors (Basel). 2021 Sep 29;21(19):6509. doi: 10.3390/s21196509.

DOI:10.3390/s21196509
PMID:34640835
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8512729/
Abstract

Languages that allow free word order, such as Arabic dialects, are of significant difficulty for neural machine translation (NMT) because of many scarce words and the inefficiency of NMT systems to translate these words. Unknown Word (UNK) tokens represent the out-of-vocabulary words for the reason that NMT systems run with vocabulary that has fixed size. Scarce words are encoded completely as sequences of subword pieces employing the Word-Piece Model. This research paper introduces the first Transformer-based neural machine translation model for Arabic vernaculars that employs subword units. The proposed solution is based on the Transformer model that has been presented lately. The use of subword units and shared vocabulary within the Arabic dialect (the source language) and modern standard Arabic (the target language) enhances the behavior of the multi-head attention sublayers for the encoder by obtaining the overall dependencies between words of input sentence for Arabic vernacular. Experiments are carried out from Levantine Arabic vernacular (LEV) to modern standard Arabic (MSA) and Maghrebi Arabic vernacular (MAG) to MSA, Gulf-MSA, Nile-MSA, Iraqi Arabic (IRQ) to MSA translation tasks. Extensive experiments confirm that the suggested model adequately addresses the unknown word issue and boosts the quality of translation from Arabic vernaculars to Modern standard Arabic (MSA).

摘要

允许自由词序的语言,如阿拉伯语方言,由于词汇量少且神经机器翻译(NMT)系统翻译这些词的效率低下,因此对 NMT 来说具有很大的难度。未知词(UNK)标记表示词汇表外的词,因为 NMT 系统使用词汇量固定的词汇表。稀有词完全通过使用子词单元的词块模型进行子词序列编码。本研究论文介绍了第一个基于 Transformer 的阿拉伯语白话神经机器翻译模型,该模型使用子词单元。所提出的解决方案基于最近提出的 Transformer 模型。在阿拉伯语方言(源语言)和现代标准阿拉伯语(目标语言)中使用子词单元和共享词汇,通过获得阿拉伯语白话输入句子中单词之间的整体依赖关系,增强了编码器多头注意力子层的行为。实验是从 Levantine 阿拉伯语白话(LEV)到现代标准阿拉伯语(MSA)和 Maghrebi 阿拉伯语白话(MAG)到 MSA、海湾 MSA、尼罗河 MSA、伊拉克阿拉伯语(IRQ)到 MSA 翻译任务进行的。广泛的实验证实,所提出的模型充分解决了未知词问题,并提高了从阿拉伯语白话到现代标准阿拉伯语(MSA)的翻译质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/62ede32e5f83/sensors-21-06509-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/72696e9c1642/sensors-21-06509-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/dad2d6b109c2/sensors-21-06509-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/1722b7117b8b/sensors-21-06509-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/dbad598c912a/sensors-21-06509-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/af66b6f7b11b/sensors-21-06509-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/b1a136b392f6/sensors-21-06509-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/1fc382c2d065/sensors-21-06509-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/a2db414bea95/sensors-21-06509-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/62ede32e5f83/sensors-21-06509-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/72696e9c1642/sensors-21-06509-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/dad2d6b109c2/sensors-21-06509-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/1722b7117b8b/sensors-21-06509-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/dbad598c912a/sensors-21-06509-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/af66b6f7b11b/sensors-21-06509-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/b1a136b392f6/sensors-21-06509-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/1fc382c2d065/sensors-21-06509-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/a2db414bea95/sensors-21-06509-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a54d/8512729/62ede32e5f83/sensors-21-06509-g009.jpg

相似文献

1
A Transformer-Based Neural Machine Translation Model for Arabic Dialects That Utilizes Subword Units.基于利用子词单元的阿拉伯方言的基于转换器的神经机器翻译模型。
Sensors (Basel). 2021 Sep 29;21(19):6509. doi: 10.3390/s21196509.
2
Improving neural machine translation for low resource languages through non-parallel corpora: a case study of Egyptian dialect to modern standard Arabic translation.通过非平行语料库改进低资源语言的神经机器翻译:以埃及方言到现代标准阿拉伯语的翻译为例
Sci Rep. 2024 Jan 27;14(1):2265. doi: 10.1038/s41598-023-51090-4.
3
A Neural Machine Translation Model for Arabic Dialects That Utilises Multitask Learning (MTL).基于多任务学习 (MTL) 的阿拉伯语方言神经机器翻译模型。
Comput Intell Neurosci. 2018 Dec 10;2018:7534712. doi: 10.1155/2018/7534712. eCollection 2018.
4
Improving neural machine translation with POS-tag features for low-resource language pairs.利用词性标注特征改进低资源语言对的神经机器翻译。
Heliyon. 2022 Aug 22;8(8):e10375. doi: 10.1016/j.heliyon.2022.e10375. eCollection 2022 Aug.
5
Hate speech detection with ADHAR: a multi-dialectal hate speech corpus in Arabic.使用ADHAR进行仇恨言论检测:一个阿拉伯语多方言仇恨言论语料库。
Front Artif Intell. 2024 May 30;7:1391472. doi: 10.3389/frai.2024.1391472. eCollection 2024.
6
Translation of questionnaires into Arabic in cross-cultural research: techniques and equivalence issues.在跨文化研究中问卷的阿拉伯语翻译:技巧和等效问题。
J Transcult Nurs. 2013 Oct;24(4):363-70. doi: 10.1177/1043659613493440. Epub 2013 Jul 8.
7
Morphological structure in the Arabic mental lexicon: Parallels between standard and dialectal Arabic.阿拉伯语心理词汇中的形态结构:标准阿拉伯语与阿拉伯语方言之间的相似之处。
Lang Cogn Process. 2013 Dec;28(10):1453-1473. doi: 10.1080/01690965.2012.719629. Epub 2012 Oct 31.
8
IADD: An integrated Arabic dialect identification dataset.IADD:一个综合的阿拉伯方言识别数据集。
Data Brief. 2021 Dec 30;40:107777. doi: 10.1016/j.dib.2021.107777. eCollection 2022 Feb.
9
Semantic textual similarity for modern standard and dialectal Arabic using transfer learning.基于迁移学习的现代标准阿拉伯语和方言的语义文本相似度研究。
PLoS One. 2022 Aug 11;17(8):e0272991. doi: 10.1371/journal.pone.0272991. eCollection 2022.
10
Beyond the Transformer: A Novel Polynomial Inherent Attention (PIA) Model and Its Great Impact on Neural Machine Translation.超越 Transformer:一种新颖的多项式固有注意(PIA)模型及其对神经机器翻译的重大影响。
Comput Intell Neurosci. 2022 Sep 21;2022:1912750. doi: 10.1155/2022/1912750. eCollection 2022.

本文引用的文献

1
A Neural Machine Translation Model for Arabic Dialects That Utilises Multitask Learning (MTL).基于多任务学习 (MTL) 的阿拉伯语方言神经机器翻译模型。
Comput Intell Neurosci. 2018 Dec 10;2018:7534712. doi: 10.1155/2018/7534712. eCollection 2018.
2
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.