• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于图像的多模态神经机器翻译的错误分析

An error analysis for image-based multi-modal neural machine translation.

作者信息

Calixto Iacer, Liu Qun

机构信息

1University of Amsterdam, ILLC, Science Park, Amsterdam, Netherlands.

Huawei Noah's Ark Lab, Hong Kong, Hong Kong.

出版信息

Mach Transl. 2019;33(1):155-177. doi: 10.1007/s10590-019-09226-9. Epub 2019 Apr 8.

DOI:10.1007/s10590-019-09226-9
PMID:31281206
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6579783/
Abstract

In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use and image features: the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both . In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models.

摘要

在本文中,我们对不同的多模态神经机器翻译(MNMT)模型进行了广泛的定量误差分析,这些模型将视觉特征集成到编码器和解码器的不同部分。我们研究了在带有图像的领域内平行句子对训练数据集上训练模型的情况。我们分析了两种不同类型的使用图像特征的MNMT模型:后者对图像进行全局编码,即有一个表示整个图像的特征向量,而前者对空间信息进行编码,即有多个特征向量,每个特征向量对图像的不同部分进行编码。我们对不同MNMT模型以及纯文本基线生成的翻译进行了误差分析,研究了在翻译这两种情况时多模态模型的比较情况。总体而言,我们发现额外的多模态信号始终能改进翻译,在使用使用全局视觉特征的更简单MNMT模型时更是如此。我们还发现,不仅具有强烈视觉内涵的术语的翻译得到了改进,而且使用多模态模型时几乎所有类型的错误都减少了。

相似文献

1
An error analysis for image-based multi-modal neural machine translation.基于图像的多模态神经机器翻译的错误分析
Mach Transl. 2019;33(1):155-177. doi: 10.1007/s10590-019-09226-9. Epub 2019 Apr 8.
2
Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling.基于文本感知跨模态对比解缠的多粒度视觉枢轴引导多模态神经机器翻译
Neural Netw. 2024 Oct;178:106403. doi: 10.1016/j.neunet.2024.106403. Epub 2024 May 23.
3
Towards better text image machine translation with multimodal codebook and multi-stage training.利用多模态码本和多阶段训练实现更好的文本图像机器翻译。
Neural Netw. 2025 Sep;189:107599. doi: 10.1016/j.neunet.2025.107599. Epub 2025 May 23.
4
Reliable multi-modal medical image-to-image translation independent of pixel-wise aligned data.可靠的多模态医学图像到图像的转换,不依赖于像素对齐的数据。
Med Phys. 2024 Nov;51(11):8283-8301. doi: 10.1002/mp.17362. Epub 2024 Aug 17.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
What Does a Language-And-Vision Transformer See: The Impact of Semantic Information on Visual Representations.语言与视觉Transformer看到了什么:语义信息对视觉表征的影响。
Front Artif Intell. 2021 Dec 3;4:767971. doi: 10.3389/frai.2021.767971. eCollection 2021.
7
A 3D hierarchical cross-modality interaction network using transformers and convolutions for brain glioma segmentation in MR images.一种使用变换和卷积的 3D 层次跨模态交互网络,用于磁共振图像中的脑胶质瘤分割。
Med Phys. 2024 Nov;51(11):8371-8389. doi: 10.1002/mp.17354. Epub 2024 Aug 13.
8
Automatic Recognition Method of Machine English Translation Errors Based on Multisignal Feature Fusion.基于多信号特征融合的机器英语翻译错误自动识别方法。
Comput Intell Neurosci. 2022 May 12;2022:2987227. doi: 10.1155/2022/2987227. eCollection 2022.
9
A dual-stream feature decomposition network with weight transformation for multi-modality image fusion.一种具有权重变换的双流特征分解网络用于多模态图像融合。
Sci Rep. 2025 Mar 3;15(1):7467. doi: 10.1038/s41598-025-92054-0.
10
A Neural Machine Translation Model for Arabic Dialects That Utilises Multitask Learning (MTL).基于多任务学习 (MTL) 的阿拉伯语方言神经机器翻译模型。
Comput Intell Neurosci. 2018 Dec 10;2018:7534712. doi: 10.1155/2018/7534712. eCollection 2018.

本文引用的文献

1
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description.长期递归卷积网络的视觉识别与描述。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):677-691. doi: 10.1109/TPAMI.2016.2599174. Epub 2016 Sep 1.
2
Deep Visual-Semantic Alignments for Generating Image Descriptions.深度视觉-语义对齐生成图像描述。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):664-676. doi: 10.1109/TPAMI.2016.2598339. Epub 2016 Aug 5.