• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于图像字幕生成的多级注意力网络与策略强化学习

Multilevel Attention Networks and Policy Reinforcement Learning for Image Caption Generation.

作者信息

Zhou Zhibo, Zhang Xiaoming, Li Zhoujun, Huang Feiran, Xu Jie

机构信息

School of Computer Science and Engineering, Beihang University, Beijing, China.

School of Cyber Science and Technology, Beihang University, Beijing, China.

出版信息

Big Data. 2022 Dec;10(6):481-492. doi: 10.1089/big.2021.0049. Epub 2021 Nov 2.

DOI:10.1089/big.2021.0049
PMID:34726529
Abstract

The analysis of large-scale multimodal data has become very popular recently. Image captioning, whose goal is to describe the content of image with natural language automatically, is an essential and challenging task in artificial intelligence. Commonly, most existing image caption methods utilize the mixture of Convolutional Neural Network and Recurrent Neural Network framework. These methods either pay attention to global representation at the image level or only focus on the specific concepts, such as regions and objects. To make the most of characteristics about a given image, in this study, we present a novel model named Multilevel Attention Networks and Policy Reinforcement Learning for image caption generation. Specifically, our model is composed of a multilevel attention network module and a policy reinforcement learning module. In the multilevel attention network, the object-attention network aims to capture global and local details about objects, whereas the region-attention network obtains global and local features about regions. After that, a policy reinforcement learning algorithm is adopted to overcome the exposure bias problem in the training phase and solve the loss-evaluation mismatching problem at the caption generation stage. With the attention network and policy algorithm, our model can automatically generate accurate and natural sentences for any particular image. We carry out extensive experiments on the MSCOCO and Flickr30k data sets, demonstrating that our model is superior to other competitive methods.

摘要

近年来,大规模多模态数据的分析变得非常流行。图像字幕生成旨在用自然语言自动描述图像内容,是人工智能中一项重要且具有挑战性的任务。通常,大多数现有的图像字幕方法都采用卷积神经网络和循环神经网络框架的组合。这些方法要么关注图像层面的全局表示,要么只关注特定概念,如区域和物体。为了充分利用给定图像的特征,在本研究中,我们提出了一种名为用于图像字幕生成的多级注意力网络和策略强化学习的新型模型。具体而言,我们的模型由一个多级注意力网络模块和一个策略强化学习模块组成。在多级注意力网络中,目标注意力网络旨在捕捉关于物体的全局和局部细节,而区域注意力网络获取关于区域的全局和局部特征。之后,采用策略强化学习算法来克服训练阶段的曝光偏差问题,并解决字幕生成阶段的损失评估不匹配问题。借助注意力网络和策略算法,我们的模型可以为任何特定图像自动生成准确自然的句子。我们在MSCOCO和Flickr30k数据集上进行了广泛的实验,证明我们的模型优于其他竞争方法。

相似文献

1
Multilevel Attention Networks and Policy Reinforcement Learning for Image Caption Generation.用于图像字幕生成的多级注意力网络与策略强化学习
Big Data. 2022 Dec;10(6):481-492. doi: 10.1089/big.2021.0049. Epub 2021 Nov 2.
2
Chinese Image Caption Generation via Visual Attention and Topic Modeling.基于视觉注意和主题建模的中文图像字幕生成。
IEEE Trans Cybern. 2022 Feb;52(2):1247-1257. doi: 10.1109/TCYB.2020.2997034. Epub 2022 Feb 16.
3
Dual Global Enhanced Transformer for image captioning.双全局增强型 Transformer 用于图像字幕生成。
Neural Netw. 2022 Apr;148:129-141. doi: 10.1016/j.neunet.2022.01.011. Epub 2022 Jan 21.
4
Translating medical image to radiological report: Adaptive multilevel multi-attention approach.将医学图像翻译为放射报告:自适应多级多关注方法。
Comput Methods Programs Biomed. 2022 Jun;221:106853. doi: 10.1016/j.cmpb.2022.106853. Epub 2022 May 4.
5
A Multilevel Transfer Learning Technique and LSTM Framework for Generating Medical Captions for Limited CT and DBT Images.一种用于为有限的CT和DBT图像生成医学图像说明的多级迁移学习技术和长短期记忆网络框架。
J Digit Imaging. 2022 Jun;35(3):564-580. doi: 10.1007/s10278-021-00567-7. Epub 2022 Feb 25.
6
Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning.深入理解对象语义:利用Transformer网络实现高级图像字幕生成
Sensors (Basel). 2024 Mar 11;24(6):1796. doi: 10.3390/s24061796.
7
An image caption model based on attention mechanism and deep reinforcement learning.一种基于注意力机制和深度强化学习的图像字幕模型。
Front Neurosci. 2023 Oct 5;17:1270850. doi: 10.3389/fnins.2023.1270850. eCollection 2023.
8
Adaptive Semantic-Enhanced Transformer for Image Captioning.用于图像字幕的自适应语义增强Transformer
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):1785-1796. doi: 10.1109/TNNLS.2022.3185320. Epub 2024 Feb 5.
9
More is Better: Precise and Detailed Image Captioning Using Online Positive Recall and Missing Concepts Mining.越多越好:使用在线正例召回和缺失概念挖掘实现精确详细的图像标注。
IEEE Trans Image Process. 2019 Jan;28(1):32-44. doi: 10.1109/TIP.2018.2855415. Epub 2018 Jul 12.
10
Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory.基于双向语义注意力引导长短期记忆的图像字幕生成
Neural Process Lett. 2019 Aug;50(1):103-119. doi: 10.1007/s11063-018-09973-5. Epub 2019 Jan 11.