• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

深度学习与词嵌入相结合生成图像字幕:地质岩石图像的图像字幕解决方案

Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images.

作者信息

Nursikuwagus Agus, Munir Rinaldi, Khodra Masayu Leylia

机构信息

Doctoral Program of Informatics, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesha No.10, Bandung 40132, Indonesia.

Department of Informatics, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesha No.10, Bandung 40132, Indonesia.

出版信息

J Imaging. 2022 Oct 22;8(11):294. doi: 10.3390/jimaging8110294.

DOI:10.3390/jimaging8110294
PMID:36354867
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9693370/
Abstract

Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, generating captions from the geological images of rocks is more focused on the background of the images. This study proposed image captioning using a convolutional neural network, long short-term memory, and word2vec to generate words from the image. The proposed model was constructed by a convolutional neural network (CNN), long short-term memory (LSTM), and word2vec and gave a dense output of 256 units. To make it properly grammatical, a sequence of predicted words was reconstructed into a sentence by the beam search algorithm with K = 3. An evaluation of the pre-trained baseline model VGG16 and our proposed CNN-A, CNN-B, CNN-C, and CNN-D models used BLEU score methods for the N-gram. The BLEU scores achieved for BLEU-1 using these models were 0.5515, 0.6463, 0.7012, 0.7620, and 0.5620, respectively. BLEU-2 showed scores of 0.6048, 0.6507, 0.7083, 0.8756, and 0.6578, respectively. BLEU-3 performed with scores of 0.6414, 0.6892, 0.7312, 0.8861, and 0.7307, respectively. Finally, BLEU-4 had scores of 0.6526, 0.6504, 0.7345, 0.8250, and 0.7537, respectively. Our CNN-C model outperformed the other models, especially the baseline model. Furthermore, there are several future challenges in studying captions, such as geological sentence structure, geological sentence phrase, and constructing words by a geological tagger.

摘要

图像字幕是为图像组装描述的过程。先前关于图像字幕的研究通常集中在前景物体上。在图像字幕概念中,有两个主要的讨论对象:背景物体和前景物体。与先前的图像字幕研究不同,从岩石的地质图像生成字幕更关注图像的背景。本研究提出了使用卷积神经网络、长短期记忆网络和词向量从图像生成单词的图像字幕方法。所提出的模型由卷积神经网络(CNN)、长短期记忆网络(LSTM)和词向量构建,并给出了256个单元的密集输出。为了使其语法正确,通过K = 3的束搜索算法将预测单词序列重建为句子。对预训练的基线模型VGG16和我们提出的CNN - A、CNN - B、CNN - C和CNN - D模型使用BLEU分数方法进行N元语法评估。使用这些模型在BLEU - 1上获得的BLEU分数分别为0.5515、0.6463、0.7012、0.7620和0.5620。BLEU - 2的分数分别为0.6048、0.6507、0.7083、0.8756和0.6578。BLEU - 3的分数分别为0.6414、0.6892、0.7312、0.8861和0.7307。最后,BLEU - 4的分数分别为0.6526、0.6504、0.7345、0.8250和0.7537。我们的CNN - C模型优于其他模型,尤其是基线模型。此外,在研究字幕方面存在一些未来挑战,例如地质句子结构、地质句子短语以及通过地质标记器构建单词。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/0497c7d4385d/jimaging-08-00294-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/2ad096bb7776/jimaging-08-00294-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/de3df3f072e0/jimaging-08-00294-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/d98aece7fdb7/jimaging-08-00294-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/afc245de23c5/jimaging-08-00294-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/7cf6ed57bdde/jimaging-08-00294-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/a087417110ff/jimaging-08-00294-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/dbfa501f75ea/jimaging-08-00294-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/ad35f49ae800/jimaging-08-00294-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/0497c7d4385d/jimaging-08-00294-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/2ad096bb7776/jimaging-08-00294-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/de3df3f072e0/jimaging-08-00294-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/d98aece7fdb7/jimaging-08-00294-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/afc245de23c5/jimaging-08-00294-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/7cf6ed57bdde/jimaging-08-00294-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/a087417110ff/jimaging-08-00294-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/dbfa501f75ea/jimaging-08-00294-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/ad35f49ae800/jimaging-08-00294-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb29/9693370/0497c7d4385d/jimaging-08-00294-g009.jpg

相似文献

1
Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images.深度学习与词嵌入相结合生成图像字幕:地质岩石图像的图像字幕解决方案
J Imaging. 2022 Oct 22;8(11):294. doi: 10.3390/jimaging8110294.
2
A Multilevel Transfer Learning Technique and LSTM Framework for Generating Medical Captions for Limited CT and DBT Images.一种用于为有限的CT和DBT图像生成医学图像说明的多级迁移学习技术和长短期记忆网络框架。
J Digit Imaging. 2022 Jun;35(3):564-580. doi: 10.1007/s10278-021-00567-7. Epub 2022 Feb 25.
3
Attention-Guided Image Captioning through Word Information.基于词信息的注意力引导图像字幕生成。
Sensors (Basel). 2021 Nov 30;21(23):7982. doi: 10.3390/s21237982.
4
Captioning Ultrasound Images Automatically.自动为超声图像添加字幕。
Med Image Comput Comput Assist Interv. 2019 Oct;22:338-346. doi: 10.1007/978-3-030-32251-9_37. Epub 2019 Oct 10.
5
Changes to Captions: An Attentive Network for Remote Sensing Change Captioning.字幕变更:一种用于遥感变化字幕的注意力网络。
IEEE Trans Image Process. 2023;32:6047-6060. doi: 10.1109/TIP.2023.3328224. Epub 2023 Nov 8.
6
Weakly Supervised Captioning of Ultrasound Images.超声图像的弱监督字幕生成
Med Image Underst Anal (2022). 2022 Jul;13413:187-198. doi: 10.1007/978-3-031-12053-4_14.
7
Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network.标题生成:一种基于改进胶囊网络的现代图像标题生成方法。
Sensors (Basel). 2022 Nov 1;22(21):8376. doi: 10.3390/s22218376.
8
Topic-Oriented Image Captioning Based on Order-Embedding.基于序嵌入的主题导向图像字幕生成
IEEE Trans Image Process. 2019 Jun;28(6):2743-2754. doi: 10.1109/TIP.2018.2889922. Epub 2018 Dec 27.
9
Image Captioning Using Motion-CNN with Object Detection.基于运动卷积神经网络的图像字幕生成与目标检测
Sensors (Basel). 2021 Feb 10;21(4):1270. doi: 10.3390/s21041270.
10
Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning.深入理解对象语义:利用Transformer网络实现高级图像字幕生成
Sensors (Basel). 2024 Mar 11;24(6):1796. doi: 10.3390/s24061796.

本文引用的文献

1
Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory.基于双向语义注意力引导长短期记忆的图像字幕生成
Neural Process Lett. 2019 Aug;50(1):103-119. doi: 10.1007/s11063-018-09973-5. Epub 2019 Jan 11.
2
Social Image Captioning: Exploring Visual Attention and User Attention.社交图像字幕生成:探索视觉注意与用户注意。
Sensors (Basel). 2018 Feb 22;18(2):646. doi: 10.3390/s18020646.
3
Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.基于属性和外部知识的图像字幕和视觉问答。
IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1367-1381. doi: 10.1109/TPAMI.2017.2708709. Epub 2017 May 26.
4
Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge.展示与讲述:从 2015 年 MSCOCO 图像字幕挑战赛中学到的经验教训。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):652-663. doi: 10.1109/TPAMI.2016.2587640. Epub 2016 Jul 7.
5
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description.长期递归卷积网络的视觉识别与描述。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):677-691. doi: 10.1109/TPAMI.2016.2599174. Epub 2016 Sep 1.
6
Deep Visual-Semantic Alignments for Generating Image Descriptions.深度视觉-语义对齐生成图像描述。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):664-676. doi: 10.1109/TPAMI.2016.2598339. Epub 2016 Aug 5.
7
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN:基于区域建议网络的实时目标检测。
IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.