• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过动态路径定制实现图像字幕生成

Image Captioning via Dynamic Path Customization.

作者信息

Ma Yiwei, Ji Jiayi, Sun Xiaoshuai, Zhou Yiyi, Hong Xiaopeng, Wu Yongjian, Ji Rongrong

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6203-6217. doi: 10.1109/TNNLS.2024.3409354. Epub 2025 Apr 4.

DOI:10.1109/TNNLS.2024.3409354
PMID:39083387
Abstract

This article explores a novel dynamic network for vision and language (V&L) tasks, where the inferring structure is customized on the fly for different inputs. Most previous state-of-the-art (SOTA) approaches are static and handcrafted networks, which not only heavily rely on expert knowledge but also ignore the semantic diversity of input samples, therefore resulting in suboptimal performance. To address these issues, we propose a novel Dynamic Transformer Network (DTNet) for image captioning, which dynamically assigns customized paths to different samples, leading to discriminative yet accurate captions. Specifically, to build a rich routing space and improve routing efficiency, we introduce five types of basic cells and group them into two separate routing spaces according to their operating domains, i.e., spatial and channel. Then, we design a Spatial-Channel Joint Router (SCJR), which endows the model with the capability of path customization based on both spatial and channel information of the input sample. To validate the effectiveness of our proposed DTNet, we conduct extensive experiments on the MS-COCO dataset and achieve new SOTA performance on both the Karpathy split and the online test server. The source code is publicly available at https://github.com/xmu-xiaoma666/DTNet.

摘要

本文探索了一种用于视觉与语言(V&L)任务的新型动态网络,其中推理结构会针对不同输入即时定制。大多数先前的最先进(SOTA)方法都是静态的手工制作网络,它们不仅严重依赖专家知识,还忽略了输入样本的语义多样性,因此导致性能欠佳。为解决这些问题,我们提出了一种用于图像字幕的新型动态变压器网络(DTNet),它能为不同样本动态分配定制路径,从而生成有区分力且准确的字幕。具体而言,为构建丰富的路由空间并提高路由效率,我们引入了五种基本单元,并根据其操作域将它们分组为两个单独的路由空间,即空间和通道。然后,我们设计了一种空间-通道联合路由器(SCJR),它赋予模型基于输入样本的空间和通道信息进行路径定制的能力。为验证我们提出的DTNet的有效性,我们在MS-COCO数据集上进行了广泛实验,并在Karpathy划分和在线测试服务器上均取得了新的SOTA性能。源代码可在https://github.com/xmu-xiaoma666/DTNet上公开获取。

相似文献

1
Image Captioning via Dynamic Path Customization.通过动态路径定制实现图像字幕生成
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6203-6217. doi: 10.1109/TNNLS.2024.3409354. Epub 2025 Apr 4.
2
Auto-Encoding and Distilling Scene Graphs for Image Captioning.自动编码和场景图蒸馏用于图像字幕生成。
IEEE Trans Pattern Anal Mach Intell. 2022 May;44(5):2313-2327. doi: 10.1109/TPAMI.2020.3042192. Epub 2022 Apr 1.
3
Multi-level semantic-aware transformer for image captioning.用于图像字幕的多级语义感知变换器
Neural Netw. 2025 Jul;187:107390. doi: 10.1016/j.neunet.2025.107390. Epub 2025 Mar 17.
4
Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning.深入理解对象语义:利用Transformer网络实现高级图像字幕生成
Sensors (Basel). 2024 Mar 11;24(6):1796. doi: 10.3390/s24061796.
5
Syntax Customized Video Captioning by Imitating Exemplar Sentences.通过模仿范例句子进行语法定制化视频字幕生成。
IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):10209-10221. doi: 10.1109/TPAMI.2021.3131618. Epub 2022 Nov 7.
6
Advancing Causal Intervention in Image Captioning With Causal Prompt.利用因果提示推进图像字幕中的因果干预
IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12631-12642. doi: 10.1109/TNNLS.2024.3487200.
7
Image Captioning Based on Semantic Scenes.基于语义场景的图像字幕
Entropy (Basel). 2024 Oct 18;26(10):876. doi: 10.3390/e26100876.
8
Visual Cluster Grounding for Image Captioning.用于图像字幕的视觉聚类基础
IEEE Trans Image Process. 2022;31:3920-3934. doi: 10.1109/TIP.2022.3177318. Epub 2022 Jun 9.
9
Dual Global Enhanced Transformer for image captioning.双全局增强型 Transformer 用于图像字幕生成。
Neural Netw. 2022 Apr;148:129-141. doi: 10.1016/j.neunet.2022.01.011. Epub 2022 Jan 21.
10
Deconfounded Image Captioning: A Causal Retrospect.去混淆图像字幕:因果回顾
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12996-13010. doi: 10.1109/TPAMI.2021.3121705. Epub 2023 Oct 3.