• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

视频枢轴无监督多模态机器翻译。

Video Pivoting Unsupervised Multi-Modal Machine Translation.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3918-3932. doi: 10.1109/TPAMI.2022.3181116. Epub 2023 Feb 3.

DOI:10.1109/TPAMI.2022.3181116
PMID:35679386
Abstract

The main challenge in the field of unsupervised machine translation (UMT) is to associate source-target sentences in the latent space. As people who speak different languages share biologically similar visual systems, various unsupervised multi-modal machine translation (UMMT) models have been proposed to improve the performances of UMT by employing visual contents in natural images to facilitate alignment. Commonly, relation information is the important semantic in a sentence. Compared with images, videos can better present the interactions between objects and the ways in which an object transforms over time. However, current state-of-the-art methods only explore scene-level or object-level information from images without explicitly modeling objects relation; thus, they are sensitive to spurious correlations, which poses a new challenge for UMMT models. In this paper, we employ a spatial-temporal graph obtained from videos to exploit object interactions in space and time for disambiguation purposes and to promote latent space alignment in UMMT. Our model employs multi-modal back-translation and features pseudo-visual pivoting, in which we learn a shared multilingual visual-semantic embedding space and incorporate visually pivoted captioning as additional weak supervision. Experimental results on the VATEX Translation 2020 and HowToWorld datasets validate the translation capabilities of our model on both sentence-level and word-level and generalizes well when videos are not available during the testing phase.

摘要

无监督机器翻译(UMT)领域的主要挑战是在潜在空间中关联源-目标句子。由于说不同语言的人共享生物上相似的视觉系统,因此提出了各种无监督多模态机器翻译(UMMT)模型,通过利用自然图像中的视觉内容来促进对齐,从而提高 UMT 的性能。通常,关系信息是句子中的重要语义。与图像相比,视频可以更好地呈现对象之间的交互以及对象随时间的变化方式。然而,当前最先进的方法仅从图像中探索场景级或对象级信息,而没有显式地对对象关系进行建模;因此,它们容易受到虚假相关性的影响,这为 UMMT 模型提出了新的挑战。在本文中,我们使用从视频中获得的时空图来利用对象在空间和时间上的交互进行去歧义,并促进 UMMT 中的潜在空间对齐。我们的模型采用多模态反向翻译和伪视觉枢轴,其中我们学习共享的多语言视觉-语义嵌入空间,并将视觉枢轴标题作为额外的弱监督纳入其中。在 VATEX Translation 2020 和 HowToWorld 数据集上的实验结果验证了我们的模型在句子级和单词级上的翻译能力,并且在测试阶段没有视频时也能很好地泛化。

相似文献

1
Video Pivoting Unsupervised Multi-Modal Machine Translation.视频枢轴无监督多模态机器翻译。
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3918-3932. doi: 10.1109/TPAMI.2022.3181116. Epub 2023 Feb 3.
2
Video Captioning with Object-Aware Spatio-Temporal Correlation and Aggregation.具有目标感知时空相关性与聚合的视频字幕
IEEE Trans Image Process. 2020 Apr 27. doi: 10.1109/TIP.2020.2988435.
3
Cross-Modal Graph With Meta Concepts for Video Captioning.用于视频字幕的带有元概念的跨模态图
IEEE Trans Image Process. 2022;31:5150-5162. doi: 10.1109/TIP.2022.3192709. Epub 2022 Aug 2.
4
Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation.通过跨模态检索和模型适配实现跨域图像字幕生成
IEEE Trans Image Process. 2021;30:1180-1192. doi: 10.1109/TIP.2020.3042086. Epub 2020 Dec 17.
5
SibNet: Sibling Convolutional Encoder for Video Captioning.SibNet:用于视频字幕的兄弟卷积编码器
IEEE Trans Pattern Anal Mach Intell. 2021 Sep;43(9):3259-3272. doi: 10.1109/TPAMI.2019.2940007. Epub 2021 Aug 4.
6
Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling.基于文本感知跨模态对比解缠的多粒度视觉枢轴引导多模态神经机器翻译
Neural Netw. 2024 Oct;178:106403. doi: 10.1016/j.neunet.2024.106403. Epub 2024 May 23.
7
Exploiting Cross-Modal Prediction and Relation Consistency for Semisupervised Image Captioning.利用跨模态预测和关系一致性进行半监督图像字幕生成
IEEE Trans Cybern. 2024 Feb;54(2):890-902. doi: 10.1109/TCYB.2022.3156367. Epub 2024 Jan 17.
8
SMART: Syntax-Calibrated Multi-Aspect Relation Transformer for Change Captioning.SMART:用于变化字幕的句法校准多方面关系变换器
IEEE Trans Pattern Anal Mach Intell. 2024 Jul;46(7):4926-4943. doi: 10.1109/TPAMI.2024.3365104. Epub 2024 Jun 5.
9
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation.ZeroNLG:用于零样本多模态和多语言自然语言生成的领域对齐与自动编码
IEEE Trans Pattern Anal Mach Intell. 2024 Aug;46(8):5712-5724. doi: 10.1109/TPAMI.2024.3371376. Epub 2024 Jul 2.
10
Latent Space Semantic Supervision Based on Knowledge Distillation for Cross-Modal Retrieval.基于知识蒸馏的潜在空间语义监督用于跨模态检索
IEEE Trans Image Process. 2022;31:7154-7164. doi: 10.1109/TIP.2022.3220051. Epub 2022 Nov 16.

引用本文的文献

1
The impact of social security systems on public health outcomes: an economic perspective on machine translation applications.社会保障体系对公共卫生结果的影响:机器翻译应用的经济学视角
Front Public Health. 2025 Jul 10;13:1597381. doi: 10.3389/fpubh.2025.1597381. eCollection 2025.
2
The analysis of learning investment effect for artificial intelligence English translation model based on deep neural network.基于深度神经网络的人工智能英语翻译模型学习投资效果分析
Sci Rep. 2025 Jul 19;15(1):26277. doi: 10.1038/s41598-025-11282-6.
3
Cross-language dissemination of Chinese classical literature using multimodal deep learning and artificial intelligence.
利用多模态深度学习和人工智能进行中国古典文学的跨语言传播。
Sci Rep. 2025 Jul 1;15(1):21648. doi: 10.1038/s41598-025-05921-1.
4
Syntactic complexity recognition and analysis in Chinese-English machine translation: A comparative study based on the BLSTM-CRF model.汉英机器翻译中句法复杂性的识别与分析:基于双向长短期记忆-条件随机场模型的比较研究
PLoS One. 2025 Jun 12;20(6):e0325721. doi: 10.1371/journal.pone.0325721. eCollection 2025.
5
An Underwater Image Enhancement Method for a Preprocessing Framework Based on Generative Adversarial Network.基于生成对抗网络的预处理框架水下图像增强方法。
Sensors (Basel). 2023 Jun 21;23(13):5774. doi: 10.3390/s23135774.
6
A Novel Unsupervised Video Anomaly Detection Framework Based on Optical Flow Reconstruction and Erased Frame Prediction.基于光流重构和遮挡帧预测的新型无监督视频异常检测框架
Sensors (Basel). 2023 May 17;23(10):4828. doi: 10.3390/s23104828.
7
Detection and Classification of Histopathological Breast Images Using a Fusion of CNN Frameworks.基于卷积神经网络框架融合的乳腺组织病理图像检测与分类
Diagnostics (Basel). 2023 May 11;13(10):1700. doi: 10.3390/diagnostics13101700.
8
Dynamic Path Planning of AGV Based on Kinematical Constraint A* Algorithm and Following DWA Fusion Algorithms.基于运动学约束 A*算法和跟随 DWA 融合算法的 AGV 动态路径规划。
Sensors (Basel). 2023 Apr 19;23(8):4102. doi: 10.3390/s23084102.
9
Robust thermal infrared tracking via an adaptively multi-feature fusion model.基于自适应多特征融合模型的稳健热红外跟踪
Neural Comput Appl. 2023;35(4):3423-3434. doi: 10.1007/s00521-022-07867-1. Epub 2022 Oct 12.