Suppr超能文献

展示与讲述:从 2015 年 MSCOCO 图像字幕挑战赛中学到的经验教训。

Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):652-663. doi: 10.1109/TPAMI.2016.2587640. Epub 2016 Jul 7.

Abstract

Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. Finally, given the recent surge of interest in this task, a competition was organized in 2015 using the newly released COCO dataset. We describe and analyze the various improvements we applied to our own baseline and show the resulting performance in the competition, which we won ex-aequo with a team from Microsoft Research.

摘要

自动描述图像内容是人工智能中的一个基本问题,它连接了计算机视觉和自然语言处理。在本文中,我们提出了一种基于深度递归架构的生成模型,该模型结合了计算机视觉和机器翻译的最新进展,可以用于生成描述图像的自然句子。该模型的训练目标是最大化给定训练图像的目标描述句子的似然度。在多个数据集上的实验表明了模型的准确性和从图像描述中学习到的语言的流畅性。我们的模型通常非常准确,我们从定性和定量两个方面进行了验证。最后,鉴于最近人们对这项任务的浓厚兴趣,我们在 2015 年使用新发布的 COCO 数据集组织了一场竞赛。我们描述并分析了我们应用于自己的基线的各种改进,并展示了在比赛中的表现,我们与微软研究院的一个团队并列第一。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验