• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于视频中跨模态人物重识别的通道混洗变压器

Channel-shuffled transformers for cross-modality person re-identification in video.

作者信息

Kasantikul Rangwan, Kusakunniran Worapan, Wu Qiang, Wang Zhiyong

机构信息

Faculty of Information and Communication Technology, Mahidol University, 999 Phuttamonthon 4 Road, Salaya, 73170, Nakhon Pathom, Thailand.

School of Computer Science, The University of Sydney, Camperdown, 2006, New South Wales, Australia.

出版信息

Sci Rep. 2025 Apr 29;15(1):15009. doi: 10.1038/s41598-025-00063-w.

DOI:10.1038/s41598-025-00063-w
PMID:40301413
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12041324/
Abstract

Effective implementation of person re-identification (Re-ID) across different modalities (such as daylight vs night-vision) is crucial for Surveillance applications. Information from multiple frames is essential for effective re-identification, where visual components from individual frames become less reliable. While transformers can enhance the temporal information extraction, the large number of channels required for effective feature encoding introduces scaling challenges. This could lead to overfitting and instability during training. Therefore, we proposed a novel Channel-Shuffled Temporal Transformer (CSTT) for processing multi-frame sequences in conjunction with a ResNet backbone to form Hybrid Channel-Shuffled Transformer Net (HCSTNET). Replacing fully connected layers in standard multi-head attention with ShuffleNet-like structures is important for integration of transformer attention with a ResNet backbone. Applying ShuffleNet-like structures reduces overfitting through parameter reduction with channel-grouping, and further improves learned attention using channel-shuffling. According to our tests with the SYSU-MM01 dataset in comparison against simple averaging of multiple frames, only the temporal transformer with channel-shuffling achieved a measurable improvement over the baseline. We have also investigated the optimal partitioning of feature maps therein.

摘要

在监控应用中,有效地跨不同模态(如日光与夜视)实现人员重新识别(Re-ID)至关重要。来自多个帧的信息对于有效的重新识别必不可少,而单个帧的视觉组件可靠性较低。虽然Transformer可以增强时间信息提取,但有效特征编码所需的大量通道带来了缩放挑战。这可能导致训练期间的过拟合和不稳定性。因此,我们提出了一种新颖的通道混洗时间Transformer(CSTT),用于结合ResNet主干处理多帧序列,以形成混合通道混洗Transformer网络(HCSTNET)。用类似ShuffleNet的结构替换标准多头注意力中的全连接层对于将Transformer注意力与ResNet主干集成很重要。应用类似ShuffleNet的结构通过通道分组减少参数来减少过拟合,并通过通道混洗进一步改善学习到的注意力。根据我们在SYSU-MM01数据集上与多帧简单平均的比较测试,只有具有通道混洗的时间Transformer相对于基线实现了可测量的改进。我们还研究了其中特征图的最佳划分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/747b/12041324/33dc05b5ac72/41598_2025_63_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/747b/12041324/8bef9f9995ae/41598_2025_63_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/747b/12041324/6ccaf8945d44/41598_2025_63_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/747b/12041324/33dc05b5ac72/41598_2025_63_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/747b/12041324/8bef9f9995ae/41598_2025_63_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/747b/12041324/6ccaf8945d44/41598_2025_63_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/747b/12041324/33dc05b5ac72/41598_2025_63_Fig3_HTML.jpg

相似文献

1
Channel-shuffled transformers for cross-modality person re-identification in video.用于视频中跨模态人物重识别的通道混洗变压器
Sci Rep. 2025 Apr 29;15(1):15009. doi: 10.1038/s41598-025-00063-w.
2
Homogeneous-to-Heterogeneous: Unsupervised Learning for RGB-Infrared Person Re-Identification.从同质地到异质地:RGB-红外人像再识别的无监督学习。
IEEE Trans Image Process. 2021;30:6392-6407. doi: 10.1109/TIP.2021.3092578. Epub 2021 Jul 14.
3
Video-based person re-identification with complementary local and global features using a graph transformer.基于视频的人物再识别,使用图变换器融合互补的局部和全局特征。
Math Biosci Eng. 2024 Jul 23;21(7):6694-6709. doi: 10.3934/mbe.2024293.
4
An Effective Video Transformer With Synchronized Spatiotemporal and Spatial Self-Attention for Action Recognition.一种用于动作识别的具有同步时空和空间自注意力的高效视频变换器。
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2496-2509. doi: 10.1109/TNNLS.2022.3190367. Epub 2024 Feb 5.
5
Graph Sampling-Based Multi-Stream Enhancement Network for Visible-Infrared Person Re-Identification.基于图采样的多流增强网络用于可见光-红外行人重识别
Sensors (Basel). 2023 Sep 18;23(18):7948. doi: 10.3390/s23187948.
6
SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.SwinCross:用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。
Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.
7
Bi-Directional Exponential Angular Triplet Loss for RGB-Infrared Person Re-Identification.用于 RGB-红外人像再识别的双向指数角三元组损失
IEEE Trans Image Process. 2021;30:1583-1595. doi: 10.1109/TIP.2020.3045261. Epub 2021 Jan 11.
8
Deeply Coupled Convolution-Transformer With Spatial-Temporal Complementary Learning for Video-Based Person Re-Identification.基于时空互补学习的深度耦合卷积-Transformer用于视频人物重识别
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):13753-13763. doi: 10.1109/TNNLS.2023.3271353. Epub 2024 Oct 7.
9
A Multi-Attention Approach for Person Re-Identification Using Deep Learning.基于深度学习的多注意力机制行人再识别方法。
Sensors (Basel). 2023 Apr 2;23(7):3678. doi: 10.3390/s23073678.
10
HViT: Hybrid vision inspired transformer for the assessment of carotid artery plaque by addressing the cross-modality domain adaptation problem in MRI.HViT:用于评估颈动脉斑块的混合视觉启发式变压器,通过解决MRI中的跨模态域适应问题。
Comput Med Imaging Graph. 2023 Oct;109:102295. doi: 10.1016/j.compmedimag.2023.102295. Epub 2023 Sep 9.

本文引用的文献

1
Graph Sampling-Based Multi-Stream Enhancement Network for Visible-Infrared Person Re-Identification.基于图采样的多流增强网络用于可见光-红外行人重识别
Sensors (Basel). 2023 Sep 18;23(18):7948. doi: 10.3390/s23187948.
2
Valence-arousal classification of emotion evoked by Chinese ancient-style music using 1D-CNN-BiLSTM model on EEG signals for college students.基于脑电信号运用1D-CNN-BiLSTM模型对大学生中国古风音乐诱发情绪的效价-唤醒度分类
Multimed Tools Appl. 2023;82(10):15439-15456. doi: 10.1007/s11042-022-14011-7. Epub 2022 Oct 4.
3
Gait recognition using a few gait frames.
使用少量步态帧进行步态识别。
PeerJ Comput Sci. 2021 Mar 1;7:e382. doi: 10.7717/peerj-cs.382. eCollection 2021.
4
Flexible Body Partition-Based Adversarial Learning for Visible Infrared Person Re-Identification.基于柔性体分区的可见光红外行人再识别对抗学习
IEEE Trans Neural Netw Learn Syst. 2022 Sep;33(9):4676-4687. doi: 10.1109/TNNLS.2021.3059713. Epub 2022 Aug 31.
5
Deep Representation Learning with Part Loss for Person Re-Identification.用于行人重识别的基于部分损失的深度表征学习
IEEE Trans Image Process. 2019 Jan 10. doi: 10.1109/TIP.2019.2891888.
6
Person Recognition System Based on a Combination of Body Images from Visible Light and Thermal Cameras.基于可见光和热成像相机人体图像组合的人员识别系统
Sensors (Basel). 2017 Mar 16;17(3):605. doi: 10.3390/s17030605.