• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

社交图像字幕生成:探索视觉注意与用户注意。

Social Image Captioning: Exploring Visual Attention and User Attention.

机构信息

College of Computer & Communication Engineering, China University of Petroleum (East China), Qingdao 266555, China.

First Research Institute of the Ministry of Public Security of PRC, Beijing 100048, China.

出版信息

Sensors (Basel). 2018 Feb 22;18(2):646. doi: 10.3390/s18020646.

DOI:10.3390/s18020646
PMID:29470409
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5855536/
Abstract

Image captioning with a natural language has been an emerging trend. However, the social image, associated with a set of user-contributed tags, has been rarely investigated for a similar task. The user-contributed tags, which could reflect the user attention, have been neglected in conventional image captioning. Most existing image captioning models cannot be applied directly to social image captioning. In this work, a dual attention model is proposed for social image captioning by combining the visual attention and user attention simultaneously.Visual attention is used to compress a large mount of salient visual information, while user attention is applied to adjust the description of the social images with user-contributed tags. Experiments conducted on the Microsoft (MS) COCO dataset demonstrate the superiority of the proposed method of dual attention.

摘要

带自然语言的图像标注已经成为一种新兴趋势。然而,与一组用户贡献的标签相关的社交图像在类似任务中很少被研究。用户贡献的标签可以反映用户的注意力,但在传统的图像标注中被忽视了。大多数现有的图像标注模型不能直接应用于社交图像标注。在这项工作中,我们通过同时结合视觉注意力和用户注意力,提出了一种用于社交图像标注的双重注意力模型。视觉注意力用于压缩大量显著的视觉信息,而用户注意力则用于根据用户贡献的标签调整对社交图像的描述。在 Microsoft (MS) COCO 数据集上进行的实验证明了所提出的双重注意力方法的优越性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ccc/5855536/887765b81768/sensors-18-00646-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ccc/5855536/887765b81768/sensors-18-00646-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ccc/5855536/887765b81768/sensors-18-00646-g001.jpg

相似文献

1
Social Image Captioning: Exploring Visual Attention and User Attention.社交图像字幕生成:探索视觉注意与用户注意。
Sensors (Basel). 2018 Feb 22;18(2):646. doi: 10.3390/s18020646.
2
Context-Aware Visual Policy Network for Fine-Grained Image Captioning.上下文感知视觉策略网络在细粒度图像标题生成中的应用
IEEE Trans Pattern Anal Mach Intell. 2022 Feb;44(2):710-722. doi: 10.1109/TPAMI.2019.2909864. Epub 2022 Jan 7.
3
Visual Cluster Grounding for Image Captioning.用于图像字幕的视觉聚类基础
IEEE Trans Image Process. 2022;31:3920-3934. doi: 10.1109/TIP.2022.3177318. Epub 2022 Jun 9.
4
Hierarchical LSTMs with Adaptive Attention for Visual Captioning.基于自适应注意力机制的分层长短时记忆网络的视觉描述生成
IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1112-1131. doi: 10.1109/TPAMI.2019.2894139. Epub 2019 Jan 21.
5
Arabic Captioning for Images of Clothing Using Deep Learning.基于深度学习的服装图像阿拉伯语字幕生成。
Sensors (Basel). 2023 Apr 7;23(8):3783. doi: 10.3390/s23083783.
6
Dual Position Relationship Transformer for Image Captioning.用于图像字幕的双位置关系变换器
Big Data. 2022 Dec;10(6):515-527. doi: 10.1089/big.2021.0262. Epub 2022 Jan 4.
7
Chinese Image Caption Generation via Visual Attention and Topic Modeling.基于视觉注意和主题建模的中文图像字幕生成。
IEEE Trans Cybern. 2022 Feb;52(2):1247-1257. doi: 10.1109/TCYB.2020.2997034. Epub 2022 Feb 16.
8
Novel concept-based image captioning models using LSTM and multi-encoder transformer architecture.使用长短期记忆网络(LSTM)和多编码器变压器架构的基于新颖概念的图像字幕模型。
Sci Rep. 2024 Sep 5;14(1):20762. doi: 10.1038/s41598-024-69664-1.
9
RefCap: image captioning with referent objects attributes.RefCap:具有指称对象属性的图像字幕生成
Sci Rep. 2023 Dec 7;13(1):21577. doi: 10.1038/s41598-023-48916-6.
10
Dual Global Enhanced Transformer for image captioning.双全局增强型 Transformer 用于图像字幕生成。
Neural Netw. 2022 Apr;148:129-141. doi: 10.1016/j.neunet.2022.01.011. Epub 2022 Jan 21.

引用本文的文献

1
Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images.深度学习与词嵌入相结合生成图像字幕:地质岩石图像的图像字幕解决方案
J Imaging. 2022 Oct 22;8(11):294. doi: 10.3390/jimaging8110294.