基于深度学习的服装图像阿拉伯语字幕生成。

Arabic Captioning for Images of Clothing Using Deep Learning.

机构信息

Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia.

出版信息

Sensors (Basel). 2023 Apr 7;23(8):3783. doi: 10.3390/s23083783.

DOI:10.3390/s23083783

PMID:37112124

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10144643/

Abstract

Fashion is one of the many fields of application that image captioning is being used in. For e-commerce websites holding tens of thousands of images of clothing, automated item descriptions are quite desirable. This paper addresses captioning images of clothing in the Arabic language using deep learning. Image captioning systems are based on Computer Vision and Natural Language Processing techniques because visual and textual understanding is needed for these systems. Many approaches have been proposed to build such systems. The most widely used methods are deep learning methods which use the image model to analyze the visual content of the image, and the language model to generate the caption. Generating the caption in the English language using deep learning algorithms received great attention from many researchers in their research, but there is still a gap in generating the caption in the Arabic language because public datasets are often not available in the Arabic language. In this work, we created an Arabic dataset for captioning images of clothing which we named "ArabicFashionData" because this model is the first model for captioning images of clothing in the Arabic language. Moreover, we classified the attributes of the images of clothing and used them as inputs to the decoder of our image captioning model to enhance Arabic caption quality. In addition, we used the attention mechanism. Our approach achieved a BLEU-1 score of 88.52. The experiment findings are encouraging and suggest that, with a bigger dataset, the attributes-based image captioning model can achieve excellent results for Arabic image captioning.

摘要

时尚是图像字幕应用的众多领域之一。对于拥有成千上万张服装图像的电子商务网站来说，自动生成商品描述是非常理想的。本文使用深度学习解决阿拉伯语服装图像的字幕问题。图像字幕系统基于计算机视觉和自然语言处理技术，因为这些系统需要视觉和文本理解。已经提出了许多方法来构建这样的系统。最广泛使用的方法是深度学习方法，它使用图像模型来分析图像的视觉内容，以及语言模型来生成字幕。使用深度学习算法生成英文标题受到许多研究人员的关注，但在生成阿拉伯语标题方面仍存在差距，因为公共数据集通常不可用。在这项工作中，我们创建了一个用于服装图像字幕的阿拉伯语数据集，我们将其命名为“阿拉伯时尚数据”，因为这个模型是第一个用于阿拉伯语服装图像字幕的模型。此外，我们对服装图像的属性进行了分类，并将其用作我们图像字幕模型的解码器的输入，以提高阿拉伯语字幕的质量。此外，我们还使用了注意力机制。我们的方法在 BLEU-1 分数上达到了 88.52。实验结果令人鼓舞，表明在更大的数据集下，基于属性的图像字幕模型可以为阿拉伯语图像字幕生成优异的结果。