• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用扩展卷积原子神经网络为盲人准确生成图像字幕。

An accurate generation of image captions for blind people using extended convolutional atom neural network.

作者信息

Tiwary Tejal, Mahapatra Rajendra Prasad

机构信息

Department of Computer Science and Engineering, SRMIST, NCR Campus, Ghaziabad, India.

Department of CSE, SRM Institute of Science & Technology, Delhi, NCR Campus, Ghaziabad, India.

出版信息

Multimed Tools Appl. 2023;82(3):3801-3830. doi: 10.1007/s11042-022-13443-5. Epub 2022 Jul 15.

DOI:10.1007/s11042-022-13443-5
PMID:35855372
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9283099/
Abstract

Recently, the progress on image understanding and AIC (Automatic Image Captioning) has attracted lots of researchers to make use of AI (Artificial Intelligence) models to assist the blind people. AIC integrates the principle of both computer vision and NLP (Natural Language Processing) to generate automatic language descriptions in relation to the image observed. This work presents a new assistive technology based on deep learning which helps the blind people to distinguish the food items in online grocery shopping. The proposed AIC model involves the following steps such as Data Collection, Non-captioned image selection, Extraction of appearance, texture features and Generation of automatic image captions. Initially, the data is collected from two public sources and the selection of non-captioned images are done using the ARO (Adaptive Rain Optimization). Next, the appearance feature is extracted using SDM (Spatial Derivative and Multi-scale) approach and WPLBP (Weighted Patch Local Binary Pattern) is used in the extraction of texture features. Finally, the captions are automatically generated using ECANN (Extended Convolutional Atom Neural Network). ECANN model combines the CNN (Convolutional Neural Network) and LSTM (Long Short-Term Memory) architectures to perform the caption reusable system to select the most accurate caption. The loss in the ECANN architecture is minimized using AAS (Adaptive Atom Search) Optimization algorithm. The implementation tool used is PYTHON and the dataset used for the analysis are Grocery datasets (Freiburg Groceries and Grocery Store Dataset). The proposed ECANN model acquired accuracy (99.46%) on Grocery Store Dataset and (99.32%) accuracy on Freiburg Groceries dataset. Thus, the performance of the proposed ECANN model is compared with other existing models to verify the supremacy of the proposed work over the other existing works.

摘要

最近,图像理解和自动图像字幕(AIC)方面的进展吸引了众多研究人员利用人工智能(AI)模型来帮助盲人。AIC融合了计算机视觉和自然语言处理(NLP)的原理,以生成与所观察图像相关的自动语言描述。这项工作提出了一种基于深度学习的新辅助技术,可帮助盲人在网上杂货店购物时辨别食品。所提出的AIC模型包括以下步骤,如数据收集、无字幕图像选择、外观和纹理特征提取以及自动图像字幕生成。首先,从两个公共来源收集数据,并使用自适应雨优化(ARO)进行无字幕图像的选择。接下来,使用空间导数和多尺度(SDM)方法提取外观特征,并使用加权补丁局部二值模式(WPLBP)提取纹理特征。最后,使用扩展卷积原子神经网络(ECANN)自动生成字幕。ECANN模型结合了卷积神经网络(CNN)和长短期记忆(LSTM)架构,以执行字幕可重用系统来选择最准确的字幕。使用自适应原子搜索(AAS)优化算法将ECANN架构中的损失最小化。所使用的实现工具是PYTHON,用于分析的数据集是杂货店数据集(弗莱堡杂货店和杂货店数据集)。所提出的ECANN模型在杂货店数据集上的准确率为99.46%,在弗莱堡杂货店数据集上的准确率为99.32%。因此,将所提出的ECANN模型的性能与其他现有模型进行比较,以验证所提出的工作相对于其他现有工作的优越性。

相似文献

1
An accurate generation of image captions for blind people using extended convolutional atom neural network.使用扩展卷积原子神经网络为盲人准确生成图像字幕。
Multimed Tools Appl. 2023;82(3):3801-3830. doi: 10.1007/s11042-022-13443-5. Epub 2022 Jul 15.
2
Chinese Image Caption Generation via Visual Attention and Topic Modeling.基于视觉注意和主题建模的中文图像字幕生成。
IEEE Trans Cybern. 2022 Feb;52(2):1247-1257. doi: 10.1109/TCYB.2020.2997034. Epub 2022 Feb 16.
3
A Multilevel Transfer Learning Technique and LSTM Framework for Generating Medical Captions for Limited CT and DBT Images.一种用于为有限的CT和DBT图像生成医学图像说明的多级迁移学习技术和长短期记忆网络框架。
J Digit Imaging. 2022 Jun;35(3):564-580. doi: 10.1007/s10278-021-00567-7. Epub 2022 Feb 25.
4
Towards Generating and Evaluating Iconographic Image Captions of Artworks.迈向生成与评估艺术作品的图像说明文字
J Imaging. 2021 Jul 23;7(8):123. doi: 10.3390/jimaging7080123.
5
Image Captioning Using Motion-CNN with Object Detection.基于运动卷积神经网络的图像字幕生成与目标检测
Sensors (Basel). 2021 Feb 10;21(4):1270. doi: 10.3390/s21041270.
6
Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images.深度学习与词嵌入相结合生成图像字幕:地质岩石图像的图像字幕解决方案
J Imaging. 2022 Oct 22;8(11):294. doi: 10.3390/jimaging8110294.
7
Human-computer interaction based health diagnostics using ResNet34 for tongue image classification.基于 ResNet34 的舌象分类的人机交互健康诊断。
Comput Methods Programs Biomed. 2022 Nov;226:107096. doi: 10.1016/j.cmpb.2022.107096. Epub 2022 Aug 28.
8
Language Processing Model Construction and Simulation Based on Hybrid CNN and LSTM.基于混合 CNN 和 LSTM 的语言处理模型构建与仿真。
Comput Intell Neurosci. 2021 Jul 6;2021:2578422. doi: 10.1155/2021/2578422. eCollection 2021.
9
Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning.深入理解对象语义:利用Transformer网络实现高级图像字幕生成
Sensors (Basel). 2024 Mar 11;24(6):1796. doi: 10.3390/s24061796.
10
Captioning Ultrasound Images Automatically.自动为超声图像添加字幕。
Med Image Comput Comput Assist Interv. 2019 Oct;22:338-346. doi: 10.1007/978-3-030-32251-9_37. Epub 2019 Oct 10.

引用本文的文献

1
Integrating AI and Assistive Technologies in Healthcare: Insights from a Narrative Review of Reviews.将人工智能与辅助技术整合于医疗保健领域:基于综述之综述的见解
Healthcare (Basel). 2025 Mar 4;13(5):556. doi: 10.3390/healthcare13050556.
2
Atom Search Optimization: a comprehensive review of its variants, applications, and future directions.原子搜索优化算法:对其变体、应用及未来方向的全面综述
PeerJ Comput Sci. 2025 Feb 28;11:e2722. doi: 10.7717/peerj-cs.2722. eCollection 2025.