• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

重磅统计对齐引导神经翻译。

Heavyweight Statistical Alignment to Guide Neural Translation.

机构信息

Natural Language Processing and Knowledge Discovery Laboratory, Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam.

Faculty of Information Technology, University of Science, Ho Chi Minh City, Vietnam.

出版信息

Comput Intell Neurosci. 2022 Jun 3;2022:6856567. doi: 10.1155/2022/6856567. eCollection 2022.

DOI:10.1155/2022/6856567
PMID:35694597
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9187440/
Abstract

Transformer neural models with multihead attentions outperform all existing translation models. Nevertheless, some features of traditional statistical models, such as prior alignment between source and target words, prove useful in training the state-of-the-art Transformer models. It has been reported that lightweight prior alignment can effectively guide a head in the multihead cross-attention sublayer responsible for the alignment of Transformer models. In this work, we make a step further by applying heavyweight prior alignments to guide all heads. Specifically, we use the weight of 0.5 for the alignment cost added to the token cost in formulating the overall cost of training a Transformer model, where the alignment cost is defined as the deviation of the attention probability from the prior alignments. Moreover, we increase the role of prior alignment, computing the attention probability by averaging all heads of the multihead attention sublayer within the penultimate layer of Transformer model. Experimental results on an English-Vietnamese translation task show that our proposed approach helps train superior Transformer-based translation models. Our Transformer model (25.71) outperforms the baseline model (21.34) by the large 4.37 BLEU. Case studies by native speakers on some translation results validate the machine judgment. The results so far encourage the use of heavyweight prior alignments to improve Transformer-based translation models. This work contributes to the literature on the machine translation, especially, for unpopular language pairs. Since the proposal in this work is language-independent, it can be applied to different language pairs, including Slavic languages.

摘要

具有多头注意力的 Transformer 神经模型优于所有现有的翻译模型。然而,传统统计模型的一些特征,如源词和目标词之间的先验对齐,在训练最先进的 Transformer 模型时被证明是有用的。据报道,轻量级的先验对齐可以有效地指导多头交叉注意力子层中的一个头,负责 Transformer 模型的对齐。在这项工作中,我们更进一步,应用重量级的先验对齐来指导所有的头。具体来说,我们在制定 Transformer 模型训练的总成本时,将添加到令牌成本中的对齐成本的权重设置为 0.5,其中对齐成本定义为注意力概率与先验对齐的偏差。此外,我们增加了先验对齐的作用,通过在 Transformer 模型的倒数第二层中对多头注意力子层的所有头进行平均来计算注意力概率。在英越翻译任务上的实验结果表明,我们提出的方法有助于训练更好的基于 Transformer 的翻译模型。我们的 Transformer 模型(25.71)比基线模型(21.34)高出 4.37 个 BLEU。以母语为英语的人对一些翻译结果的案例研究验证了机器的判断。到目前为止的结果鼓励使用重量级的先验对齐来改进基于 Transformer 的翻译模型。这项工作对机器翻译的文献做出了贡献,特别是对于不受欢迎的语言对。由于这项工作的建议是与语言无关的,因此它可以应用于不同的语言对,包括斯拉夫语言。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6ec/9187440/da4df690b351/CIN2022-6856567.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6ec/9187440/341e4cd76710/CIN2022-6856567.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6ec/9187440/1275aea285cb/CIN2022-6856567.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6ec/9187440/4d4282804b9a/CIN2022-6856567.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6ec/9187440/da4df690b351/CIN2022-6856567.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6ec/9187440/341e4cd76710/CIN2022-6856567.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6ec/9187440/1275aea285cb/CIN2022-6856567.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6ec/9187440/4d4282804b9a/CIN2022-6856567.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6ec/9187440/da4df690b351/CIN2022-6856567.004.jpg

相似文献

1
Heavyweight Statistical Alignment to Guide Neural Translation.重磅统计对齐引导神经翻译。
Comput Intell Neurosci. 2022 Jun 3;2022:6856567. doi: 10.1155/2022/6856567. eCollection 2022.
2
Mixed-Level Neural Machine Translation.混合层级神经机器翻译
Comput Intell Neurosci. 2020 Nov 29;2020:8859452. doi: 10.1155/2020/8859452. eCollection 2020.
3
Improving neural machine translation with POS-tag features for low-resource language pairs.利用词性标注特征改进低资源语言对的神经机器翻译。
Heliyon. 2022 Aug 22;8(8):e10375. doi: 10.1016/j.heliyon.2022.e10375. eCollection 2022 Aug.
4
Beyond the Transformer: A Novel Polynomial Inherent Attention (PIA) Model and Its Great Impact on Neural Machine Translation.超越 Transformer:一种新颖的多项式固有注意(PIA)模型及其对神经机器翻译的重大影响。
Comput Intell Neurosci. 2022 Sep 21;2022:1912750. doi: 10.1155/2022/1912750. eCollection 2022.
5
The neural machine translation models for the low-resource Kazakh-English language pair.针对低资源哈萨克语-英语语言对的神经机器翻译模型。
PeerJ Comput Sci. 2023 Feb 8;9:e1224. doi: 10.7717/peerj-cs.1224. eCollection 2023.
6
Adoption of Wireless Network and Artificial Intelligence Algorithm in Chinese-English Tense Translation.无线网络和人工智能算法在汉英时态翻译中的应用。
Comput Intell Neurosci. 2022 Jun 11;2022:1662311. doi: 10.1155/2022/1662311. eCollection 2022.
7
Analysis of Chinese Machine Translation Training Based on Deep Learning Technology.基于深度学习技术的中文机器翻译训练分析。
Comput Intell Neurosci. 2022 Aug 2;2022:6502831. doi: 10.1155/2022/6502831. eCollection 2022.
8
English-Chinese Machine Translation Based on Transfer Learning and Chinese-English Corpus.基于迁移学习和英汉双语语料库的英汉机器翻译。
Comput Intell Neurosci. 2022 Sep 27;2022:1563731. doi: 10.1155/2022/1563731. eCollection 2022.
9
Adaptation of machine translation for multilingual information retrieval in the medical domain.医学领域中用于多语言信息检索的机器翻译适配
Artif Intell Med. 2014 Jul;61(3):165-85. doi: 10.1016/j.artmed.2014.01.004. Epub 2014 Feb 5.
10
An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention.基于改进的 Transformer 的神经机器翻译策略:交互头注意力。
Comput Intell Neurosci. 2022 Jun 21;2022:2998242. doi: 10.1155/2022/2998242. eCollection 2022.

本文引用的文献

1
Mixed-Level Neural Machine Translation.混合层级神经机器翻译
Comput Intell Neurosci. 2020 Nov 29;2020:8859452. doi: 10.1155/2020/8859452. eCollection 2020.
2
State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis.最先进的增强型自然语言处理转换器模型,用于直接和单步逆合成。
Nat Commun. 2020 Nov 4;11(1):5575. doi: 10.1038/s41467-020-19266-y.