• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过多粒度图像-文本对齐改进基于描述的行人重识别

Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments.

作者信息

Niu Kai, Huang Yan, Ouyang Wanli, Wang Liang

出版信息

IEEE Trans Image Process. 2020 Apr 7. doi: 10.1109/TIP.2020.2984883.

DOI:10.1109/TIP.2020.2984883
PMID:32275593
Abstract

Description-based person re-identification (Re-id) is an important task in video surveillance that requires discriminative cross-modal representations to distinguish different people. It is difficult to directly measure the similarity between images and descriptions due to the modality heterogeneity (the crossmodal problem). And all samples belonging to a single category (the fine-grained problem) makes this task even harder than the conventional image-description matching task. In this paper, we propose a Multi-granularity Image-text Alignments (MIA) model to alleviate the cross-modal fine-grained problem for better similarity evaluation in description-based person Re-id. Specifically, three different granularities, i.e., global-global, global-local and local-local alignments are carried out hierarchically. Firstly, the global-global alignment in the Global Contrast (GC) module is for matching the global contexts of images and descriptions. Secondly, the global-local alignment employs the potential relations between local components and global contexts to highlight the distinguishable components while eliminating the uninvolved ones adaptively in the Relation-guided Global-local Alignment (RGA) module. Thirdly, as for the local-local alignment, we match visual human parts with noun phrases in the Bi-directional Fine-grained Matching (BFM) module. The whole network combining multiple granularities can be end-to-end trained without complex preprocessing. To address the difficulties in training the combination of multiple granularities, an effective step training strategy is proposed to train these granularities step-by-step. Extensive experiments and analysis have shown that our method obtains the state-of-the-art performance on the CUHK-PEDES dataset and outperforms the previous methods by a significant margin.

摘要

基于描述的行人重识别(Re-id)是视频监控中的一项重要任务,它需要有判别力的跨模态表示来区分不同的人。由于模态异质性(跨模态问题),很难直接测量图像与描述之间的相似度。而且所有属于单个类别的样本(细粒度问题)使得这项任务比传统的图像-描述匹配任务更加困难。在本文中,我们提出了一种多粒度图像-文本对齐(MIA)模型,以缓解跨模态细粒度问题,从而在基于描述的行人Re-id中进行更好的相似度评估。具体来说,我们分层进行三种不同粒度的对齐,即全局-全局、全局-局部和局部-局部对齐。首先,全局对比(GC)模块中的全局-全局对齐用于匹配图像和描述的全局上下文。其次,全局-局部对齐利用局部组件与全局上下文之间的潜在关系,在关系引导的全局-局部对齐(RGA)模块中突出可区分的组件,同时自适应地消除无关组件。第三,对于局部-局部对齐,我们在双向细粒度匹配(BFM)模块中将视觉人体部位与名词短语进行匹配。结合多个粒度的整个网络可以在无需复杂预处理的情况下进行端到端训练。为了解决训练多个粒度组合的困难,我们提出了一种有效的分步训练策略来逐步训练这些粒度。大量的实验和分析表明,我们的方法在CUHK-PEDES数据集上取得了领先的性能,并且显著优于以前的方法。

相似文献

1
Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments.通过多粒度图像-文本对齐改进基于描述的行人重识别
IEEE Trans Image Process. 2020 Apr 7. doi: 10.1109/TIP.2020.2984883.
2
Learning Aligned Image-Text Representations Using Graph Attentive Relational Network.使用图注意力关系网络学习对齐的图像-文本表示
IEEE Trans Image Process. 2021;30:1840-1852. doi: 10.1109/TIP.2020.3048627. Epub 2021 Jan 18.
3
CLIP-Driven Fine-Grained Text-Image Person Re-Identification.基于CLIP的细粒度文本-图像人物重识别
IEEE Trans Image Process. 2023;32:6032-6046. doi: 10.1109/TIP.2023.3327924. Epub 2023 Nov 7.
4
Image-Specific Information Suppression and Implicit Local Alignment for Text-Based Person Search.基于文本的行人搜索中的图像特定信息抑制与隐式局部对齐
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17973-17986. doi: 10.1109/TNNLS.2023.3310118. Epub 2024 Dec 2.
5
MAVA: Multi-level Adaptive Visual-textual Alignment by Cross-media Bi-attention Mechanism.MAVA:基于跨媒体双向注意力机制的多层次自适应视觉文本对齐
IEEE Trans Image Process. 2019 Nov 22. doi: 10.1109/TIP.2019.2952085.
6
Graph Sampling-Based Multi-Stream Enhancement Network for Visible-Infrared Person Re-Identification.基于图采样的多流增强网络用于可见光-红外行人重识别
Sensors (Basel). 2023 Sep 18;23(18):7948. doi: 10.3390/s23187948.
7
Hybrid Attention Network for Language-Based Person Search.基于语言的人物搜索的混合注意力网络。
Sensors (Basel). 2020 Sep 15;20(18):5279. doi: 10.3390/s20185279.
8
Verbal-Person Nets: Pose-Guided Multi-Granularity Language-to-Person Generation.言语-人物网络:姿势引导的多粒度语言到人物生成
IEEE Trans Neural Netw Learn Syst. 2023 Nov;34(11):8589-8601. doi: 10.1109/TNNLS.2022.3151631. Epub 2023 Oct 27.
9
Cross-Modal Attention With Semantic Consistence for Image-Text Matching.用于图像-文本匹配的具有语义一致性的跨模态注意力机制
IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5412-5425. doi: 10.1109/TNNLS.2020.2967597. Epub 2020 Nov 30.
10
Decoupled Cross-Modal Phrase-Attention Network for Image-Sentence Matching.用于图像-句子匹配的解耦跨模态短语注意力网络
IEEE Trans Image Process. 2024;33:1326-1337. doi: 10.1109/TIP.2022.3197972. Epub 2024 Feb 13.

引用本文的文献

1
Image region semantic enhancement and symmetric semantic completion for text-to-image person search.用于文本到图像人物搜索的图像区域语义增强和对称语义补全
Sci Rep. 2025 Jul 1;15(1):21224. doi: 10.1038/s41598-025-00904-8.