Suppr超能文献

社交图像字幕生成:探索视觉注意与用户注意。

Social Image Captioning: Exploring Visual Attention and User Attention.

机构信息

College of Computer & Communication Engineering, China University of Petroleum (East China), Qingdao 266555, China.

First Research Institute of the Ministry of Public Security of PRC, Beijing 100048, China.

出版信息

Sensors (Basel). 2018 Feb 22;18(2):646. doi: 10.3390/s18020646.

Abstract

Image captioning with a natural language has been an emerging trend. However, the social image, associated with a set of user-contributed tags, has been rarely investigated for a similar task. The user-contributed tags, which could reflect the user attention, have been neglected in conventional image captioning. Most existing image captioning models cannot be applied directly to social image captioning. In this work, a dual attention model is proposed for social image captioning by combining the visual attention and user attention simultaneously.Visual attention is used to compress a large mount of salient visual information, while user attention is applied to adjust the description of the social images with user-contributed tags. Experiments conducted on the Microsoft (MS) COCO dataset demonstrate the superiority of the proposed method of dual attention.

摘要

带自然语言的图像标注已经成为一种新兴趋势。然而,与一组用户贡献的标签相关的社交图像在类似任务中很少被研究。用户贡献的标签可以反映用户的注意力,但在传统的图像标注中被忽视了。大多数现有的图像标注模型不能直接应用于社交图像标注。在这项工作中,我们通过同时结合视觉注意力和用户注意力,提出了一种用于社交图像标注的双重注意力模型。视觉注意力用于压缩大量显著的视觉信息,而用户注意力则用于根据用户贡献的标签调整对社交图像的描述。在 Microsoft (MS) COCO 数据集上进行的实验证明了所提出的双重注意力方法的优越性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ccc/5855536/887765b81768/sensors-18-00646-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验