Suppr超能文献

通过多模态记忆网络实现个性化图像字幕生成

Towards Personalized Image Captioning via Multimodal Memory Networks.

作者信息

Park Cesc Chunseong, Kim Byeongchang, Kim Gunhee

出版信息

IEEE Trans Pattern Anal Mach Intell. 2018 Apr 10. doi: 10.1109/TPAMI.2018.2824816.

Abstract

We address personalized image captioning, which generates a descriptive sentence for a user's image, accounting for prior knowledge such as her active vocabularies or writing style in her previous documents. As applications of personalized image captioning, we solve two post automation tasks in social networks: hashtag prediction and post generation. The hashtag prediction predicts a list of hashtags for an image, while the post generation creates a natural post text consisting of normal words, emojis, and even hashtags. We propose a novel personalized captioning model named Context Sequence Memory Network (CSMN). Its unique updates over existing memory networks include (i) exploiting memory as a repository for multiple types of context information, (ii) appending previously generated words into memory to capture long-term information, and (iii) adopting CNN memory structure to jointly represent nearby ordered memory slots for better context understanding. For evaluation, we collect a new dataset InstaPIC-1.1M, comprising 1.1M Instagram posts from 6.3K users. We further use the benchmark YFCC100M dataset to validate the generality of our approach. With quantitative evaluation and user studies via Amazon Mechanical Turk, we show that the three novel features of the CSMN help enhance the performance of personalized image captioning over state-of-the-art captioning models.

摘要

我们致力于个性化图像字幕生成,即针对用户的图像生成描述性语句,同时考虑诸如其常用词汇或之前文档中的写作风格等先验知识。作为个性化图像字幕生成的应用,我们解决社交网络中的两个后期自动化任务:主题标签预测和帖子生成。主题标签预测为图像预测一系列主题标签,而帖子生成则创建由普通单词、表情符号甚至主题标签组成的自然帖子文本。我们提出了一种名为上下文序列记忆网络(CSMN)的新型个性化字幕模型。它相对于现有记忆网络的独特更新包括:(i)将记忆用作多种类型上下文信息的存储库;(ii)将先前生成的单词追加到记忆中以捕获长期信息;(iii)采用卷积神经网络记忆结构来联合表示相邻的有序记忆槽,以更好地理解上下文。为了进行评估,我们收集了一个新的数据集InstaPIC-1.1M,它包含来自6300个用户的110万条Instagram帖子。我们还使用基准YFCC100M数据集来验证我们方法的通用性。通过定量评估以及在亚马逊土耳其机器人平台上进行的用户研究,我们表明CSMN的三个新特性有助于在性能上超越最先进的字幕模型,提升个性化图像字幕生成的效果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验