使用扩展卷积原子神经网络为盲人准确生成图像字幕。

Tiwary Tejal, Mahapatra Rajendra Prasad

Department of Computer Science and Engineering, SRMIST, NCR Campus, Ghaziabad, India.

Department of CSE, SRM Institute of Science & Technology, Delhi, NCR Campus, Ghaziabad, India.

Multimed Tools Appl. 2023;82(3):3801-3830. doi: 10.1007/s11042-022-13443-5. Epub 2022 Jul 15.

Recently, the progress on image understanding and AIC (Automatic Image Captioning) has attracted lots of researchers to make use of AI (Artificial Intelligence) models to assist the blind people. AIC integrates the principle of both computer vision and NLP (Natural Language Processing) to generate automatic language descriptions in relation to the image observed. This work presents a new assistive technology based on deep learning which helps the blind people to distinguish the food items in online grocery shopping. The proposed AIC model involves the following steps such as Data Collection, Non-captioned image selection, Extraction of appearance, texture features and Generation of automatic image captions. Initially, the data is collected from two public sources and the selection of non-captioned images are done using the ARO (Adaptive Rain Optimization). Next, the appearance feature is extracted using SDM (Spatial Derivative and Multi-scale) approach and WPLBP (Weighted Patch Local Binary Pattern) is used in the extraction of texture features. Finally, the captions are automatically generated using ECANN (Extended Convolutional Atom Neural Network). ECANN model combines the CNN (Convolutional Neural Network) and LSTM (Long Short-Term Memory) architectures to perform the caption reusable system to select the most accurate caption. The loss in the ECANN architecture is minimized using AAS (Adaptive Atom Search) Optimization algorithm. The implementation tool used is PYTHON and the dataset used for the analysis are Grocery datasets (Freiburg Groceries and Grocery Store Dataset). The proposed ECANN model acquired accuracy (99.46%) on Grocery Store Dataset and (99.32%) accuracy on Freiburg Groceries dataset. Thus, the performance of the proposed ECANN model is compared with other existing models to verify the supremacy of the proposed work over the other existing works.

最近，图像理解和自动图像字幕（AIC）方面的进展吸引了众多研究人员利用人工智能（AI）模型来帮助盲人。AIC融合了计算机视觉和自然语言处理（NLP）的原理，以生成与所观察图像相关的自动语言描述。这项工作提出了一种基于深度学习的新辅助技术，可帮助盲人在网上杂货店购物时辨别食品。所提出的AIC模型包括以下步骤，如数据收集、无字幕图像选择、外观和纹理特征提取以及自动图像字幕生成。首先，从两个公共来源收集数据，并使用自适应雨优化（ARO）进行无字幕图像的选择。接下来，使用空间导数和多尺度（SDM）方法提取外观特征，并使用加权补丁局部二值模式（WPLBP）提取纹理特征。最后，使用扩展卷积原子神经网络（ECANN）自动生成字幕。ECANN模型结合了卷积神经网络（CNN）和长短期记忆（LSTM）架构，以执行字幕可重用系统来选择最准确的字幕。使用自适应原子搜索（AAS）优化算法将ECANN架构中的损失最小化。所使用的实现工具是PYTHON，用于分析的数据集是杂货店数据集（弗莱堡杂货店和杂货店数据集）。所提出的ECANN模型在杂货店数据集上的准确率为99.46%，在弗莱堡杂货店数据集上的准确率为99.32%。因此，将所提出的ECANN模型的性能与其他现有模型进行比较，以验证所提出的工作相对于其他现有工作的优越性。

相似文献

An accurate generation of image captions for blind people using extended convolutional atom neural network.

Multimed Tools Appl. 2023;82(3):3801-3830. doi: 10.1007/s11042-022-13443-5. Epub 2022 Jul 15.

Chinese Image Caption Generation via Visual Attention and Topic Modeling.

IEEE Trans Cybern. 2022 Feb;52(2):1247-1257. doi: 10.1109/TCYB.2020.2997034. Epub 2022 Feb 16.

A Multilevel Transfer Learning Technique and LSTM Framework for Generating Medical Captions for Limited CT and DBT Images.

J Digit Imaging. 2022 Jun;35(3):564-580. doi: 10.1007/s10278-021-00567-7. Epub 2022 Feb 25.

Towards Generating and Evaluating Iconographic Image Captions of Artworks.

J Imaging. 2021 Jul 23;7(8):123. doi: 10.3390/jimaging7080123.

Image Captioning Using Motion-CNN with Object Detection.

Sensors (Basel). 2021 Feb 10;21(4):1270. doi: 10.3390/s21041270.

Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images.

J Imaging. 2022 Oct 22;8(11):294. doi: 10.3390/jimaging8110294.

Human-computer interaction based health diagnostics using ResNet34 for tongue image classification.

Comput Methods Programs Biomed. 2022 Nov;226:107096. doi: 10.1016/j.cmpb.2022.107096. Epub 2022 Aug 28.

Language Processing Model Construction and Simulation Based on Hybrid CNN and LSTM.

Comput Intell Neurosci. 2021 Jul 6;2021:2578422. doi: 10.1155/2021/2578422. eCollection 2021.

Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning.

Sensors (Basel). 2024 Mar 11;24(6):1796. doi: 10.3390/s24061796.

Captioning Ultrasound Images Automatically.

Med Image Comput Comput Assist Interv. 2019 Oct;22:338-346. doi: 10.1007/978-3-030-32251-9_37. Epub 2019 Oct 10.

引用本文的文献

Integrating AI and Assistive Technologies in Healthcare: Insights from a Narrative Review of Reviews.

Healthcare (Basel). 2025 Mar 4;13(5):556. doi: 10.3390/healthcare13050556.

Atom Search Optimization: a comprehensive review of its variants, applications, and future directions.

PeerJ Comput Sci. 2025 Feb 28;11:e2722. doi: 10.7717/peerj-cs.2722. eCollection 2025.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

An accurate generation of image captions for blind people using extended convolutional atom neural network.

Multimed Tools Appl. 2023;82(3):3801-3830. doi: 10.1007/s11042-022-13443-5. Epub 2022 Jul 15.

Chinese Image Caption Generation via Visual Attention and Topic Modeling.

IEEE Trans Cybern. 2022 Feb;52(2):1247-1257. doi: 10.1109/TCYB.2020.2997034. Epub 2022 Feb 16.

A Multilevel Transfer Learning Technique and LSTM Framework for Generating Medical Captions for Limited CT and DBT Images.

J Digit Imaging. 2022 Jun;35(3):564-580. doi: 10.1007/s10278-021-00567-7. Epub 2022 Feb 25.

Towards Generating and Evaluating Iconographic Image Captions of Artworks.

J Imaging. 2021 Jul 23;7(8):123. doi: 10.3390/jimaging7080123.

Image Captioning Using Motion-CNN with Object Detection.

Sensors (Basel). 2021 Feb 10;21(4):1270. doi: 10.3390/s21041270.

Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images.

J Imaging. 2022 Oct 22;8(11):294. doi: 10.3390/jimaging8110294.

Human-computer interaction based health diagnostics using ResNet34 for tongue image classification.

Comput Methods Programs Biomed. 2022 Nov;226:107096. doi: 10.1016/j.cmpb.2022.107096. Epub 2022 Aug 28.

Language Processing Model Construction and Simulation Based on Hybrid CNN and LSTM.

Comput Intell Neurosci. 2021 Jul 6;2021:2578422. doi: 10.1155/2021/2578422. eCollection 2021.

Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning.

Sensors (Basel). 2024 Mar 11;24(6):1796. doi: 10.3390/s24061796.

Captioning Ultrasound Images Automatically.

Med Image Comput Comput Assist Interv. 2019 Oct;22:338-346. doi: 10.1007/978-3-030-32251-9_37. Epub 2019 Oct 10.

引用本文的文献

Integrating AI and Assistive Technologies in Healthcare: Insights from a Narrative Review of Reviews.

Healthcare (Basel). 2025 Mar 4;13(5):556. doi: 10.3390/healthcare13050556.

Atom Search Optimization: a comprehensive review of its variants, applications, and future directions.

PeerJ Comput Sci. 2025 Feb 28;11:e2722. doi: 10.7717/peerj-cs.2722. eCollection 2025.

An accurate generation of image captions for blind people using extended convolutional atom neural network.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献