Suppr超能文献

使用扩展卷积原子神经网络为盲人准确生成图像字幕。

An accurate generation of image captions for blind people using extended convolutional atom neural network.

作者信息

Tiwary Tejal, Mahapatra Rajendra Prasad

机构信息

Department of Computer Science and Engineering, SRMIST, NCR Campus, Ghaziabad, India.

Department of CSE, SRM Institute of Science & Technology, Delhi, NCR Campus, Ghaziabad, India.

出版信息

Multimed Tools Appl. 2023;82(3):3801-3830. doi: 10.1007/s11042-022-13443-5. Epub 2022 Jul 15.

Abstract

Recently, the progress on image understanding and AIC (Automatic Image Captioning) has attracted lots of researchers to make use of AI (Artificial Intelligence) models to assist the blind people. AIC integrates the principle of both computer vision and NLP (Natural Language Processing) to generate automatic language descriptions in relation to the image observed. This work presents a new assistive technology based on deep learning which helps the blind people to distinguish the food items in online grocery shopping. The proposed AIC model involves the following steps such as Data Collection, Non-captioned image selection, Extraction of appearance, texture features and Generation of automatic image captions. Initially, the data is collected from two public sources and the selection of non-captioned images are done using the ARO (Adaptive Rain Optimization). Next, the appearance feature is extracted using SDM (Spatial Derivative and Multi-scale) approach and WPLBP (Weighted Patch Local Binary Pattern) is used in the extraction of texture features. Finally, the captions are automatically generated using ECANN (Extended Convolutional Atom Neural Network). ECANN model combines the CNN (Convolutional Neural Network) and LSTM (Long Short-Term Memory) architectures to perform the caption reusable system to select the most accurate caption. The loss in the ECANN architecture is minimized using AAS (Adaptive Atom Search) Optimization algorithm. The implementation tool used is PYTHON and the dataset used for the analysis are Grocery datasets (Freiburg Groceries and Grocery Store Dataset). The proposed ECANN model acquired accuracy (99.46%) on Grocery Store Dataset and (99.32%) accuracy on Freiburg Groceries dataset. Thus, the performance of the proposed ECANN model is compared with other existing models to verify the supremacy of the proposed work over the other existing works.

摘要

最近,图像理解和自动图像字幕(AIC)方面的进展吸引了众多研究人员利用人工智能(AI)模型来帮助盲人。AIC融合了计算机视觉和自然语言处理(NLP)的原理,以生成与所观察图像相关的自动语言描述。这项工作提出了一种基于深度学习的新辅助技术,可帮助盲人在网上杂货店购物时辨别食品。所提出的AIC模型包括以下步骤,如数据收集、无字幕图像选择、外观和纹理特征提取以及自动图像字幕生成。首先,从两个公共来源收集数据,并使用自适应雨优化(ARO)进行无字幕图像的选择。接下来,使用空间导数和多尺度(SDM)方法提取外观特征,并使用加权补丁局部二值模式(WPLBP)提取纹理特征。最后,使用扩展卷积原子神经网络(ECANN)自动生成字幕。ECANN模型结合了卷积神经网络(CNN)和长短期记忆(LSTM)架构,以执行字幕可重用系统来选择最准确的字幕。使用自适应原子搜索(AAS)优化算法将ECANN架构中的损失最小化。所使用的实现工具是PYTHON,用于分析的数据集是杂货店数据集(弗莱堡杂货店和杂货店数据集)。所提出的ECANN模型在杂货店数据集上的准确率为99.46%,在弗莱堡杂货店数据集上的准确率为99.32%。因此,将所提出的ECANN模型的性能与其他现有模型进行比较,以验证所提出的工作相对于其他现有工作的优越性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验