• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于深度学习的服装图像阿拉伯语字幕生成。

Arabic Captioning for Images of Clothing Using Deep Learning.

机构信息

Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia.

出版信息

Sensors (Basel). 2023 Apr 7;23(8):3783. doi: 10.3390/s23083783.

DOI:10.3390/s23083783
PMID:37112124
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10144643/
Abstract

Fashion is one of the many fields of application that image captioning is being used in. For e-commerce websites holding tens of thousands of images of clothing, automated item descriptions are quite desirable. This paper addresses captioning images of clothing in the Arabic language using deep learning. Image captioning systems are based on Computer Vision and Natural Language Processing techniques because visual and textual understanding is needed for these systems. Many approaches have been proposed to build such systems. The most widely used methods are deep learning methods which use the image model to analyze the visual content of the image, and the language model to generate the caption. Generating the caption in the English language using deep learning algorithms received great attention from many researchers in their research, but there is still a gap in generating the caption in the Arabic language because public datasets are often not available in the Arabic language. In this work, we created an Arabic dataset for captioning images of clothing which we named "ArabicFashionData" because this model is the first model for captioning images of clothing in the Arabic language. Moreover, we classified the attributes of the images of clothing and used them as inputs to the decoder of our image captioning model to enhance Arabic caption quality. In addition, we used the attention mechanism. Our approach achieved a BLEU-1 score of 88.52. The experiment findings are encouraging and suggest that, with a bigger dataset, the attributes-based image captioning model can achieve excellent results for Arabic image captioning.

摘要

时尚是图像字幕应用的众多领域之一。对于拥有成千上万张服装图像的电子商务网站来说,自动生成商品描述是非常理想的。本文使用深度学习解决阿拉伯语服装图像的字幕问题。图像字幕系统基于计算机视觉和自然语言处理技术,因为这些系统需要视觉和文本理解。已经提出了许多方法来构建这样的系统。最广泛使用的方法是深度学习方法,它使用图像模型来分析图像的视觉内容,以及语言模型来生成字幕。使用深度学习算法生成英文标题受到许多研究人员的关注,但在生成阿拉伯语标题方面仍存在差距,因为公共数据集通常不可用。在这项工作中,我们创建了一个用于服装图像字幕的阿拉伯语数据集,我们将其命名为“阿拉伯时尚数据”,因为这个模型是第一个用于阿拉伯语服装图像字幕的模型。此外,我们对服装图像的属性进行了分类,并将其用作我们图像字幕模型的解码器的输入,以提高阿拉伯语字幕的质量。此外,我们还使用了注意力机制。我们的方法在 BLEU-1 分数上达到了 88.52。实验结果令人鼓舞,表明在更大的数据集下,基于属性的图像字幕模型可以为阿拉伯语图像字幕生成优异的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/ee3ebf281499/sensors-23-03783-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/e12c336ff4ec/sensors-23-03783-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/91ad1b59f579/sensors-23-03783-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/239e835ea1e8/sensors-23-03783-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/5076f4e65b15/sensors-23-03783-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/24d5e0ed2e97/sensors-23-03783-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/6535aa06cacb/sensors-23-03783-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/4d43af2de8eb/sensors-23-03783-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/ee3ebf281499/sensors-23-03783-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/e12c336ff4ec/sensors-23-03783-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/91ad1b59f579/sensors-23-03783-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/239e835ea1e8/sensors-23-03783-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/5076f4e65b15/sensors-23-03783-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/24d5e0ed2e97/sensors-23-03783-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/6535aa06cacb/sensors-23-03783-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/4d43af2de8eb/sensors-23-03783-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ff3/10144643/ee3ebf281499/sensors-23-03783-g008.jpg

相似文献

1
Arabic Captioning for Images of Clothing Using Deep Learning.基于深度学习的服装图像阿拉伯语字幕生成。
Sensors (Basel). 2023 Apr 7;23(8):3783. doi: 10.3390/s23083783.
2
A Multilevel Transfer Learning Technique and LSTM Framework for Generating Medical Captions for Limited CT and DBT Images.一种用于为有限的CT和DBT图像生成医学图像说明的多级迁移学习技术和长短期记忆网络框架。
J Digit Imaging. 2022 Jun;35(3):564-580. doi: 10.1007/s10278-021-00567-7. Epub 2022 Feb 25.
3
Chinese Image Caption Generation via Visual Attention and Topic Modeling.基于视觉注意和主题建模的中文图像字幕生成。
IEEE Trans Cybern. 2022 Feb;52(2):1247-1257. doi: 10.1109/TCYB.2020.2997034. Epub 2022 Feb 16.
4
Image Captioning Using Motion-CNN with Object Detection.基于运动卷积神经网络的图像字幕生成与目标检测
Sensors (Basel). 2021 Feb 10;21(4):1270. doi: 10.3390/s21041270.
5
Weakly Supervised Captioning of Ultrasound Images.超声图像的弱监督字幕生成
Med Image Underst Anal (2022). 2022 Jul;13413:187-198. doi: 10.1007/978-3-031-12053-4_14.
6
Context-Fused Guidance for Image Captioning Using Sequence-Level Training.基于序列级训练的上下文融合图像字幕生成
Comput Intell Neurosci. 2022 Jan 5;2022:9743123. doi: 10.1155/2022/9743123. eCollection 2022.
7
Enhancing image caption generation through context-aware attention mechanism.通过上下文感知注意力机制增强图像字幕生成
Heliyon. 2024 Aug 19;10(17):e36272. doi: 10.1016/j.heliyon.2024.e36272. eCollection 2024 Sep 15.
8
Novel concept-based image captioning models using LSTM and multi-encoder transformer architecture.使用长短期记忆网络(LSTM)和多编码器变压器架构的基于新颖概念的图像字幕模型。
Sci Rep. 2024 Sep 5;14(1):20762. doi: 10.1038/s41598-024-69664-1.
9
Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning.深入理解对象语义:利用Transformer网络实现高级图像字幕生成
Sensors (Basel). 2024 Mar 11;24(6):1796. doi: 10.3390/s24061796.
10
A Survey on Learning Objects' Relationship for Image Captioning.面向图像字幕的学习对象关系调查。
Comput Intell Neurosci. 2023 May 29;2023:8600853. doi: 10.1155/2023/8600853. eCollection 2023.

本文引用的文献

1
Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates.面向时尚的图像字幕生成,结合外部知识检索和全注意力门控。
Sensors (Basel). 2023 Jan 23;23(3):1286. doi: 10.3390/s23031286.
2
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions.深度学习综述:概念、卷积神经网络架构、挑战、应用及未来方向。
J Big Data. 2021;8(1):53. doi: 10.1186/s40537-021-00444-8. Epub 2021 Mar 31.
3
Clothing Landmark Detection Using Deep Networks With Prior of Key Point Associations.
基于关键点关联先验的深度网络进行服装地标检测
IEEE Trans Cybern. 2019 Oct;49(10):3744-3754. doi: 10.1109/TCYB.2018.2850745. Epub 2018 Jul 12.
4
Transfer learning for visual categorization: a survey.迁移学习在视觉分类中的应用综述。
IEEE Trans Neural Netw Learn Syst. 2015 May;26(5):1019-34. doi: 10.1109/TNNLS.2014.2330900. Epub 2014 Jul 1.