• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于图像全局和局部特征融合的中文图像内容描述研究。

Research on image content description in Chinese based on fusion of image global and local features.

机构信息

School of Computer and Communication, Lanzhou University of Technology, Lanzhou, China.

Department of Mathematics and Computer Science, Fort Valley State University, Fort Valley, GA, United States of America.

出版信息

PLoS One. 2022 Aug 29;17(8):e0271322. doi: 10.1371/journal.pone.0271322. eCollection 2022.

DOI:10.1371/journal.pone.0271322
PMID:36037226
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9423645/
Abstract

Most image content modelling methods are designed for English description which is different form Chinese in syntax structure. The few existing Chinese image description models do not fully integrate the global features and the local features of an image, limiting the capability of the models to represent the details of the image. In this paper, an encoder-decoder architecture based on the fusion of global and local features is used to describe the Chinese image content. In the encoding stage, the global and local features of the image are extracted by the Convolutional Neural Network (CNN) and the target detection network, and fed to the feature fusion module. In the decoding stage, an image feature attention mechanism is used to calculate the weights of word vectors, and a new gating mechanism is added to the traditional Long Short-Term Memory (LSTM) network to emphasize the fused image features, and the corresponding word vectors. In the description generation stage, the beam search algorithm is used to optimize the word vector generation process. The integration of global and local features of the image is strengthened to allow the model to fully understand the details of the image through the above three stages. The experimental results show that the model improves the quality of Chinese description of image content. Compared with the baseline model, the score of CIDEr evaluation index improves by 20.07%, and other evaluation indices also improve significantly.

摘要

大多数图像内容建模方法都是为英语描述设计的,其语法结构与中文不同。现有的少数中文图像描述模型没有充分整合图像的全局特征和局部特征,限制了模型对图像细节的表示能力。本文提出了一种基于全局特征和局部特征融合的编解码器架构,用于描述中文图像内容。在编码阶段,通过卷积神经网络(CNN)和目标检测网络提取图像的全局和局部特征,并将其输入到特征融合模块中。在解码阶段,使用图像特征注意力机制计算词向量的权重,并在传统的长短期记忆(LSTM)网络中添加新的门控机制,以强调融合后的图像特征和相应的词向量。在描述生成阶段,使用束搜索算法优化词向量生成过程。通过以上三个阶段,增强了图像的全局和局部特征的融合,使模型能够通过充分理解图像的细节来提高中文图像内容描述的质量。实验结果表明,与基线模型相比,该模型的 CIDEr 评估指标得分提高了 20.07%,其他评估指标也有显著提高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fb/9423645/d57a68d0923c/pone.0271322.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fb/9423645/bfbaa4fecde7/pone.0271322.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fb/9423645/907eb3b91cb7/pone.0271322.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fb/9423645/389781af2386/pone.0271322.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fb/9423645/a236450c87b5/pone.0271322.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fb/9423645/ea69ad331773/pone.0271322.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fb/9423645/d57a68d0923c/pone.0271322.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fb/9423645/bfbaa4fecde7/pone.0271322.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fb/9423645/907eb3b91cb7/pone.0271322.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fb/9423645/389781af2386/pone.0271322.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fb/9423645/a236450c87b5/pone.0271322.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fb/9423645/ea69ad331773/pone.0271322.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5fb/9423645/d57a68d0923c/pone.0271322.g006.jpg

相似文献

1
Research on image content description in Chinese based on fusion of image global and local features.基于图像全局和局部特征融合的中文图像内容描述研究。
PLoS One. 2022 Aug 29;17(8):e0271322. doi: 10.1371/journal.pone.0271322. eCollection 2022.
2
Chinese Image Caption Generation via Visual Attention and Topic Modeling.基于视觉注意和主题建模的中文图像字幕生成。
IEEE Trans Cybern. 2022 Feb;52(2):1247-1257. doi: 10.1109/TCYB.2020.2997034. Epub 2022 Feb 16.
3
Multichannel Two-Dimensional Convolutional Neural Network Based on Interactive Features and Group Strategy for Chinese Sentiment Analysis.基于交互特征和分组策略的多通道二维卷积神经网络的中文情感分析。
Sensors (Basel). 2022 Jan 18;22(3):714. doi: 10.3390/s22030714.
4
Dual Position Relationship Transformer for Image Captioning.用于图像字幕的双位置关系变换器
Big Data. 2022 Dec;10(6):515-527. doi: 10.1089/big.2021.0262. Epub 2022 Jan 4.
5
Multiple attention-based encoder-decoder networks for gas meter character recognition.基于多头注意力的编解码器网络在煤气表字符识别中的应用。
Sci Rep. 2022 Jun 20;12(1):10371. doi: 10.1038/s41598-022-14434-0.
6
Dual Global Enhanced Transformer for image captioning.双全局增强型 Transformer 用于图像字幕生成。
Neural Netw. 2022 Apr;148:129-141. doi: 10.1016/j.neunet.2022.01.011. Epub 2022 Jan 21.
7
A novel M-SegNet with global attention CNN architecture for automatic segmentation of brain MRI.一种新颖的基于全局注意力 CNN 架构的 M-SegNet,用于自动分割脑 MRI。
Comput Biol Med. 2021 Sep;136:104761. doi: 10.1016/j.compbiomed.2021.104761. Epub 2021 Aug 13.
8
GC-Net: Global context network for medical image segmentation.GC-Net:用于医学图像分割的全局上下文网络。
Comput Methods Programs Biomed. 2020 Jul;190:105121. doi: 10.1016/j.cmpb.2019.105121. Epub 2019 Oct 4.
9
Multi-Scale Squeeze U-SegNet with Multi Global Attention for Brain MRI Segmentation.多尺度挤压 U-Net 与多全局注意力融合的脑 MRI 分割方法
Sensors (Basel). 2021 May 12;21(10):3363. doi: 10.3390/s21103363.
10
Entity recognition in Chinese clinical text using attention-based CNN-LSTM-CRF.基于注意力机制的卷积神经网络-长短时记忆网络-条件随机场在中文临床文本中的实体识别。
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):74. doi: 10.1186/s12911-019-0787-y.

引用本文的文献

1
Image captioning in Bengali language using visual attention.使用视觉注意力的孟加拉语图像字幕生成
PLoS One. 2025 Feb 13;20(2):e0309364. doi: 10.1371/journal.pone.0309364. eCollection 2025.

本文引用的文献

1
Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation.通过跨模态检索和模型适配实现跨域图像字幕生成
IEEE Trans Image Process. 2021;30:1180-1192. doi: 10.1109/TIP.2020.3042086. Epub 2020 Dec 17.
2
Chinese Image Caption Generation via Visual Attention and Topic Modeling.基于视觉注意和主题建模的中文图像字幕生成。
IEEE Trans Cybern. 2022 Feb;52(2):1247-1257. doi: 10.1109/TCYB.2020.2997034. Epub 2022 Feb 16.
3
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN:基于区域建议网络的实时目标检测。
IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.
4
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.