Hu Xintong, Liu Peishun, Wang Xuefang, Wu Peiyao, Tang Ruichun
Faculty of Information Science and Engineering, Ocean University of China, Qingdao, Shandong, 266100, China.
School of Mathematical Sciences, Ocean University of China, Qingdao, Shandong, 266100, China.
Vis Comput Ind Biomed Art. 2025 Apr 16;8(1):10. doi: 10.1186/s42492-025-00190-1.
In the task of domain generalization person re-identification (ReID), pedestrian image features exhibit significant intra-class variability and inter-class similarity. Existing methods rely on a single feature extraction architecture and struggle to capture both global context and local spatial information, resulting in weaker generalization to unseen domains. To address this issue, an innovative domain generalization person ReID method-LViT-Net, which combines local semantics and multi-feature cross fusion, is proposed. LViT-Net adopts a dual-branch encoder with a parallel hierarchical structure to extract both local and global discriminative features. In the local branch, the local multi-scale feature fusion module is designed to fuse local feature units at different scales to ensure that the fine-grained local features at various levels are accurately captured, thereby enhancing the robustness of the features. In the global branch, the dual feature cross fusion module fuses local features and global semantic information, focusing on critical semantic information and enabling the mutual refinement and matching of local and global features. This allows the model to achieve a dynamic balance between detailed and holistic information, forming robust feature representations of pedestrians. Extensive experiments demonstrate the effectiveness of LViT-Net. In both single-source and multi-source comparison experiments, the proposed method outperforms existing state-of-the-art methods.
在领域泛化行人重识别(ReID)任务中,行人图像特征表现出显著的类内变异性和类间相似性。现有方法依赖单一特征提取架构,难以同时捕捉全局上下文和局部空间信息,导致对未见领域的泛化能力较弱。为解决这一问题,提出了一种创新的领域泛化行人ReID方法——LV iT-Net,它结合了局部语义和多特征交叉融合。LV iT-Net采用具有并行层次结构的双分支编码器来提取局部和全局判别特征。在局部分支中,设计了局部多尺度特征融合模块,以融合不同尺度的局部特征单元,确保准确捕捉各级别的细粒度局部特征,从而增强特征的鲁棒性。在全局分支中,双特征交叉融合模块融合局部特征和全局语义信息,聚焦关键语义信息,实现局部和全局特征的相互细化与匹配。这使得模型能够在详细信息和整体信息之间实现动态平衡,形成行人的鲁棒特征表示。大量实验证明了LV iT-Net的有效性。在单源和多源比较实验中,该方法均优于现有的最先进方法。