• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于文本感知跨模态对比解缠的多粒度视觉枢轴引导多模态神经机器翻译

Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling.

作者信息

Guo Junjun, Su Rui, Ye Junjie

机构信息

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, 650500, China; Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, 650500, China.

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, 650500, China; Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, 650500, China.

出版信息

Neural Netw. 2024 Oct;178:106403. doi: 10.1016/j.neunet.2024.106403. Epub 2024 May 23.

DOI:10.1016/j.neunet.2024.106403
PMID:38815470
Abstract

The goal of multi-modal neural machine translation (MNMT) is to incorporate language-agnostic visual information into text to enhance the performance of machine translation. However, due to the inherent differences between image and text, these two modalities inevitably suffer from semantic mismatch problems. To tackle this issue, this paper adopts a multi-grained visual pivot-guided multi-modal fusion strategy with cross-modal contrastive disentangling to eliminate the linguistic gaps between different languages. By using the disentangled multi-grained visual information as a cross-lingual pivot, we can enhance the alignment between different languages and improve the performance of MNMT. We first introduce text-guided stacked cross-modal disentangling modules to progressively disentangle image into two types of visual information: MT-related visual and background information. Then we effectively integrate these two kinds of multi-grained visual elements to assist target sentence generation. Extensive experiments on four benchmark MNMT datasets are conducted, and the results demonstrate that our proposed approach achieves significant improvement over the other state-of-the-art (SOTA) approaches on all test sets. The in-depth analysis highlights the benefits of text-guided cross-modal disentangling and visual pivot-based multi-modal fusion strategies in MNMT. We release the code at https://github.com/nlp-mnmt/ConVisPiv-MNMT.

摘要

多模态神经机器翻译(MNMT)的目标是将与语言无关的视觉信息融入文本,以提高机器翻译的性能。然而,由于图像和文本之间存在固有差异,这两种模态不可避免地会出现语义不匹配问题。为了解决这个问题,本文采用了一种多粒度视觉枢轴引导的多模态融合策略,并结合跨模态对比解缠,以消除不同语言之间的语言鸿沟。通过将解缠后的多粒度视觉信息用作跨语言枢轴,我们可以增强不同语言之间的对齐,并提高MNMT的性能。我们首先引入文本引导的堆叠跨模态解缠模块,逐步将图像解缠为两种视觉信息:与机器翻译相关的视觉信息和背景信息。然后,我们有效地整合这两种多粒度视觉元素,以辅助目标句子生成。我们在四个基准MNMT数据集上进行了广泛的实验,结果表明,我们提出的方法在所有测试集上都比其他现有最先进(SOTA)方法有显著改进。深入分析突出了文本引导的跨模态解缠和基于视觉枢轴的多模态融合策略在MNMT中的优势。我们将代码发布在https://github.com/nlp-mnmt/ConVisPiv-MNMT上。

相似文献

1
Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling.基于文本感知跨模态对比解缠的多粒度视觉枢轴引导多模态神经机器翻译
Neural Netw. 2024 Oct;178:106403. doi: 10.1016/j.neunet.2024.106403. Epub 2024 May 23.
2
An error analysis for image-based multi-modal neural machine translation.基于图像的多模态神经机器翻译的错误分析
Mach Transl. 2019;33(1):155-177. doi: 10.1007/s10590-019-09226-9. Epub 2019 Apr 8.
3
Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective.从多任务视角看自然保护图像数据中的细粒度跨模态语义一致性
Sensors (Basel). 2024 May 14;24(10):3130. doi: 10.3390/s24103130.
4
Histopathology language-image representation learning for fine-grained digital pathology cross-modal retrieval.用于细粒度数字病理学跨模态检索的组织病理学语言-图像表示学习
Med Image Anal. 2024 Jul;95:103163. doi: 10.1016/j.media.2024.103163. Epub 2024 Apr 9.
5
Efficient Token-Guided Image-Text Retrieval With Consistent Multimodal Contrastive Training.高效的基于令牌的图像-文本检索与一致的多模态对比训练。
IEEE Trans Image Process. 2023;32:3622-3633. doi: 10.1109/TIP.2023.3286710. Epub 2023 Jul 3.
6
Hybrid Attention Network for Language-Based Person Search.基于语言的人物搜索的混合注意力网络。
Sensors (Basel). 2020 Sep 15;20(18):5279. doi: 10.3390/s20185279.
7
Detecting and Grounding Multi-Modal Media Manipulation and Beyond.检测与定位多模态媒体操纵及其他相关内容。
IEEE Trans Pattern Anal Mach Intell. 2024 Aug;46(8):5556-5574. doi: 10.1109/TPAMI.2024.3367749. Epub 2024 Jul 2.
8
Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning.基于句子级图像-语言对比学习的多粒度放射学报告生成
IEEE Trans Med Imaging. 2024 Jul;43(7):2657-2669. doi: 10.1109/TMI.2024.3372638. Epub 2024 Jul 1.
9
CLIP-Driven Fine-Grained Text-Image Person Re-Identification.基于CLIP的细粒度文本-图像人物重识别
IEEE Trans Image Process. 2023;32:6032-6046. doi: 10.1109/TIP.2023.3327924. Epub 2023 Nov 7.
10
Cross-Modal Search for Social Networks via Adversarial Learning.基于对抗学习的社交网络跨模态检索。
Comput Intell Neurosci. 2020 Jul 11;2020:7834953. doi: 10.1155/2020/7834953. eCollection 2020.

引用本文的文献

1
The analysis of learning investment effect for artificial intelligence English translation model based on deep neural network.基于深度神经网络的人工智能英语翻译模型学习投资效果分析
Sci Rep. 2025 Jul 19;15(1):26277. doi: 10.1038/s41598-025-11282-6.
2
Cross-language dissemination of Chinese classical literature using multimodal deep learning and artificial intelligence.利用多模态深度学习和人工智能进行中国古典文学的跨语言传播。
Sci Rep. 2025 Jul 1;15(1):21648. doi: 10.1038/s41598-025-05921-1.
3
Counterclockwise block-by-block knowledge distillation for neural network compression.
用于神经网络压缩的逆时针逐块知识蒸馏
Sci Rep. 2025 Apr 3;15(1):11369. doi: 10.1038/s41598-025-91152-3.