基于改进的 Transformer 的神经机器翻译策略：交互头注意力。

An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention.

机构信息

School of Artificial Intelligence, Beijing Normal University, Beijing 100875, China.

出版信息

Comput Intell Neurosci. 2022 Jun 21;2022:2998242. doi: 10.1155/2022/2998242. eCollection 2022.

DOI:10.1155/2022/2998242

PMID:35774445

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9239798/

Abstract

Transformer-based models have gained significant advances in neural machine translation (NMT). The main component of the transformer is the multihead attention layer. In theory, more heads enhance the expressive power of the NMT model. But this is not always the case in practice. On the one hand, the computations of each head attention are conducted in the same subspace, without considering the different subspaces of all the tokens. On the other hand, the low-rank bottleneck may occur, when the number of heads surpasses a threshold. To address the low-rank bottleneck, the two mainstream methods make the head size equal to the sequence length and complicate the distribution of self-attention heads. However, these methods are challenged by the variable sequence length in the corpus and the sheer number of parameters to be learned. Therefore, this paper proposes the interacting-head attention mechanism, which induces deeper and wider interactions across the attention heads by low-dimension computations in different subspaces of all the tokens, and chooses the appropriate number of heads to avoid low-rank bottleneck. The proposed model was tested on machine translation tasks of IWSLT2016 DE-EN, WMT17 EN-DE, and WMT17 EN-CS. Compared to the original multihead attention, our model improved the performance by 2.78 BLEU/0.85 WER/2.90 METEOR/2.65 ROUGE_L/0.29 CIDEr/2.97 YiSi and 2.43 BLEU/1.38 WER/3.05 METEOR/2.70 ROUGE_L/0.30 CIDEr/3.59 YiSi on the evaluation set and the test set, respectively, for IWSLT2016 DE-EN, 2.31 BLEU/5.94 WER/1.46 METEOR/1.35 ROUGE_L/0.07 CIDEr/0.33 YiSi and 1.62 BLEU/6.04 WER/1.39 METEOR/0.11 CIDEr/0.87 YiSi on the evaluation set and newstest2014, respectively, for WMT17 EN-DE, and 3.87 BLEU/3.05 WER/9.22 METEOR/3.81 ROUGE_L/0.36 CIDEr/4.14 YiSi and 4.62 BLEU/2.41 WER/9.82 METEOR/4.82 ROUGE_L/0.44 CIDEr/5.25 YiSi on the evaluation set and newstest2014, respectively, for WMT17 EN-CS.

摘要

基于转换器的模型在神经机器翻译（NMT）方面取得了重大进展。转换器的主要组成部分是多头注意力层。理论上，更多的头可以增强 NMT 模型的表达能力。但在实践中并非总是如此。一方面，每个头注意力的计算都是在相同的子空间中进行的，而没有考虑所有令牌的不同子空间。另一方面，当头的数量超过某个阈值时，可能会出现低秩瓶颈。为了解决低秩瓶颈问题，两种主流方法都使头的大小等于序列长度，并使自注意力头的分布复杂化。然而，这些方法受到语料库中可变序列长度和要学习的大量参数的挑战。因此，本文提出了交互头注意力机制，该机制通过所有令牌的不同子空间中的低维计算来诱导注意力头之间更深和更广泛的交互，并选择适当数量的头来避免低秩瓶颈。所提出的模型在 IWSLT2016 DE-EN、WMT17 EN-DE 和 WMT17 EN-CS 的机器翻译任务上进行了测试。与原始多头注意力相比，我们的模型在 IWSLT2016 DE-EN 的评估集和测试集上分别提高了 2.78 BLEU/0.85 WER/2.90 METEOR/2.65 ROUGE_L/0.29 CIDEr/2.97 YiSi 和 2.43 BLEU/1.38 WER/3.05 METEOR/2.70 ROUGE_L/0.30 CIDEr/3.59 YiSi 的性能，在 WMT17 EN-DE 的评估集和 newstest2014 上分别提高了 2.31 BLEU/5.94 WER/1.46 METEOR/1.35 ROUGE_L/0.07 CIDEr/0.33 YiSi 和 1.62 BLEU/6.04 WER/1.39 METEOR/0.11 CIDEr/0.87 YiSi 的性能，在 WMT17 EN-CS 的评估集和 newstest2014 上分别提高了 3.87 BLEU/3.05 WER/9.22 METEOR/3.81 ROUGE_L/0.36 CIDEr/4.14 YiSi 和 4.62 BLEU/2.41 WER/9.82 METEOR/4.82 ROUGE_L/0.44 CIDEr/5.25 YiSi 的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8451/9239798/727f5489ea4e/CIN2022-2998242.001.jpg

相似文献

An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention.基于改进的 Transformer 的神经机器翻译策略：交互头注意力。

Comput Intell Neurosci. 2022 Jun 21;2022:2998242. doi: 10.1155/2022/2998242. eCollection 2022.

Heavyweight Statistical Alignment to Guide Neural Translation.重磅统计对齐引导神经翻译。

Comput Intell Neurosci. 2022 Jun 3;2022:6856567. doi: 10.1155/2022/6856567. eCollection 2022.

English-Chinese Machine Translation Based on Transfer Learning and Chinese-English Corpus.基于迁移学习和英汉双语语料库的英汉机器翻译。

Comput Intell Neurosci. 2022 Sep 27;2022:1563731. doi: 10.1155/2022/1563731. eCollection 2022.

Quantum neural network based machine translator for Hindi to English.基于量子神经网络的印地语到英语机器翻译器。

ScientificWorldJournal. 2014;2014:485737. doi: 10.1155/2014/485737. Epub 2014 Feb 27.

The neural machine translation models for the low-resource Kazakh-English language pair.针对低资源哈萨克语-英语语言对的神经机器翻译模型。

PeerJ Comput Sci. 2023 Feb 8;9:e1224. doi: 10.7717/peerj-cs.1224. eCollection 2023.

Video captioning based on vision transformer and reinforcement learning.基于视觉Transformer和强化学习的视频字幕

PeerJ Comput Sci. 2022 Mar 16;8:e916. doi: 10.7717/peerj-cs.916. eCollection 2022.

Automatic generation of conclusions from neuroradiology MRI reports through natural language processing.通过自然语言处理自动生成神经放射学 MRI 报告的结论。

Neuroradiology. 2024 Apr;66(4):477-485. doi: 10.1007/s00234-024-03312-3. Epub 2024 Feb 21.

Efficient incremental training using a novel NMT-SMT hybrid framework for translation of low-resource languages.使用新颖的神经机器翻译-统计机器翻译混合框架进行低资源语言翻译的高效增量训练。

Front Artif Intell. 2024 Sep 25;7:1381290. doi: 10.3389/frai.2024.1381290. eCollection 2024.

Improving neural machine translation with POS-tag features for low-resource language pairs.利用词性标注特征改进低资源语言对的神经机器翻译。

Heliyon. 2022 Aug 22;8(8):e10375. doi: 10.1016/j.heliyon.2022.e10375. eCollection 2022 Aug.

Beyond the Transformer: A Novel Polynomial Inherent Attention (PIA) Model and Its Great Impact on Neural Machine Translation.超越 Transformer：一种新颖的多项式固有注意（PIA）模型及其对神经机器翻译的重大影响。

Comput Intell Neurosci. 2022 Sep 21;2022:1912750. doi: 10.1155/2022/1912750. eCollection 2022.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于改进的 Transformer 的神经机器翻译策略：交互头注意力。

An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention.

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献