• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

展示与讲述:从 2015 年 MSCOCO 图像字幕挑战赛中学到的经验教训。

Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):652-663. doi: 10.1109/TPAMI.2016.2587640. Epub 2016 Jul 7.

DOI:10.1109/TPAMI.2016.2587640
PMID:28055847
Abstract

Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. Finally, given the recent surge of interest in this task, a competition was organized in 2015 using the newly released COCO dataset. We describe and analyze the various improvements we applied to our own baseline and show the resulting performance in the competition, which we won ex-aequo with a team from Microsoft Research.

摘要

自动描述图像内容是人工智能中的一个基本问题,它连接了计算机视觉和自然语言处理。在本文中,我们提出了一种基于深度递归架构的生成模型,该模型结合了计算机视觉和机器翻译的最新进展,可以用于生成描述图像的自然句子。该模型的训练目标是最大化给定训练图像的目标描述句子的似然度。在多个数据集上的实验表明了模型的准确性和从图像描述中学习到的语言的流畅性。我们的模型通常非常准确,我们从定性和定量两个方面进行了验证。最后,鉴于最近人们对这项任务的浓厚兴趣,我们在 2015 年使用新发布的 COCO 数据集组织了一场竞赛。我们描述并分析了我们应用于自己的基线的各种改进,并展示了在比赛中的表现,我们与微软研究院的一个团队并列第一。

相似文献

1
Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge.展示与讲述:从 2015 年 MSCOCO 图像字幕挑战赛中学到的经验教训。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):652-663. doi: 10.1109/TPAMI.2016.2587640. Epub 2016 Jul 7.
2
Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation.通过跨模态检索和模型适配实现跨域图像字幕生成
IEEE Trans Image Process. 2021;30:1180-1192. doi: 10.1109/TIP.2020.3042086. Epub 2020 Dec 17.
3
Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs.图像-文本手术:通过生成伪对在图像字幕中进行高效概念学习
IEEE Trans Neural Netw Learn Syst. 2018 Dec;29(12):5910-5921. doi: 10.1109/TNNLS.2018.2813306. Epub 2018 Apr 5.
4
From Show to Tell: A Survey on Deep Learning-Based Image Captioning.从展示到讲述:基于深度学习的图像字幕研究综述
IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):539-559. doi: 10.1109/TPAMI.2022.3148210. Epub 2022 Dec 5.
5
Novel concept-based image captioning models using LSTM and multi-encoder transformer architecture.使用长短期记忆网络(LSTM)和多编码器变压器架构的基于新颖概念的图像字幕模型。
Sci Rep. 2024 Sep 5;14(1):20762. doi: 10.1038/s41598-024-69664-1.
6
Deep Visual-Semantic Alignments for Generating Image Descriptions.深度视觉-语义对齐生成图像描述。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):664-676. doi: 10.1109/TPAMI.2016.2598339. Epub 2016 Aug 5.
7
Chinese Image Caption Generation via Visual Attention and Topic Modeling.基于视觉注意和主题建模的中文图像字幕生成。
IEEE Trans Cybern. 2022 Feb;52(2):1247-1257. doi: 10.1109/TCYB.2020.2997034. Epub 2022 Feb 16.
8
Social Image Captioning: Exploring Visual Attention and User Attention.社交图像字幕生成:探索视觉注意与用户注意。
Sensors (Basel). 2018 Feb 22;18(2):646. doi: 10.3390/s18020646.
9
An Ensemble of Generation- and Retrieval-based Image Captioning with Dual Generator Generative Adversarial Network.基于双生成器生成对抗网络的基于生成与检索的图像字幕集成。
IEEE Trans Image Process. 2020 Oct 15;PP. doi: 10.1109/TIP.2020.3028651.
10
Arabic Captioning for Images of Clothing Using Deep Learning.基于深度学习的服装图像阿拉伯语字幕生成。
Sensors (Basel). 2023 Apr 7;23(8):3783. doi: 10.3390/s23083783.

引用本文的文献

1
Image Captioning Based on Semantic Scenes.基于语义场景的图像字幕
Entropy (Basel). 2024 Oct 18;26(10):876. doi: 10.3390/e26100876.
2
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning.多模态基准测试:用于多模态表示学习的多尺度基准测试
Adv Neural Inf Process Syst. 2021 Dec;2021(DB1):1-20.
3
Attention mechanism and mixup data augmentation for classification of COVID-19 Computed Tomography images.用于COVID-19计算机断层扫描图像分类的注意力机制与混合数据增强
J King Saud Univ Comput Inf Sci. 2022 Sep;34(8):6199-6207. doi: 10.1016/j.jksuci.2021.07.005. Epub 2021 Jul 15.
4
An Anti-Noise Convolutional Neural Network for Bearing Fault Diagnosis Based on Multi-Channel Data.一种基于多通道数据的用于轴承故障诊断的抗噪声卷积神经网络
Sensors (Basel). 2023 Jul 25;23(15):6654. doi: 10.3390/s23156654.
5
Infrared Image Caption Based on Object-Oriented Attention.基于面向对象注意力的红外图像字幕
Entropy (Basel). 2023 May 22;25(5):826. doi: 10.3390/e25050826.
6
Detecting fake news by exploring the consistency of multimodal data.通过探索多模态数据的一致性来检测虚假新闻。
Inf Process Manag. 2021 Sep;58(5):102610. doi: 10.1016/j.ipm.2021.102610. Epub 2021 May 3.
7
Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images.深度学习与词嵌入相结合生成图像字幕:地质岩石图像的图像字幕解决方案
J Imaging. 2022 Oct 22;8(11):294. doi: 10.3390/jimaging8110294.
8
A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint.多模态学习综述——从文本指导的视觉处理视角。
Sensors (Basel). 2022 Sep 8;22(18):6816. doi: 10.3390/s22186816.
9
Deep GRU-CNN Model for COVID-19 Detection From Chest X-Rays Data.基于胸部X光数据的用于COVID-19检测的深度门控循环单元-卷积神经网络模型
IEEE Access. 2021 May 5;10:35094-35105. doi: 10.1109/ACCESS.2021.3077592. eCollection 2022.
10
Medical Image Captioning Using Optimized Deep Learning Model.基于优化深度学习模型的医学影像字幕生成。
Comput Intell Neurosci. 2022 Mar 9;2022:9638438. doi: 10.1155/2022/9638438. eCollection 2022.