• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

VIPS:用于行人搜索的学习视图不变特征

VIPS: Learning-View-Invariant Feature for Person Search.

作者信息

Wang Hexu, Luo Wenlong, Wu Wei, Xie Fei, Liu Jindong, Li Jing, Zhang Shizhou

机构信息

Xi'an Key Laboratory of Human-Machine Integration and Control Technology for Intelligent Rehabilitation, Xijing University, Xi'an 710123, China.

School of Information Science and Technology, Northwest University, Xi'an 710100, China.

出版信息

Sensors (Basel). 2025 Aug 29;25(17):5362. doi: 10.3390/s25175362.

DOI:10.3390/s25175362
PMID:40942791
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12431201/
Abstract

Unmanned aerial vehicles (UAVs) have become indispensable tools for surveillance, enabled by their ability to capture multi-perspective imagery in dynamic environments. Among critical UAV-based tasks, cross-platform person search-detecting and identifying individuals across distributed camera networks-presents unique challenges. Severe viewpoint variations, occlusions, and cluttered backgrounds in UAV-captured data degrade the performance of conventional discriminative models, which struggle to maintain robustness under such geometric and semantic disparities. To address this, we propose iew-nvariant erson earch (VIPS), a novel two-stage framework combining Faster R-CNN with a view-invariant re-Identification (VIReID) module. Unlike conventional discriminative models, VIPS leverages the semantic flexibility of large vision-language models (VLMs) and adopts a two-stage training strategy to decouple and align text-based ID descriptors and visual features, enabling robust cross-view matching through shared semantic embeddings. To mitigate noise from occlusions and cluttered UAV-captured backgrounds, we introduce a learnable mask generator for feature purification. Furthermore, drawing from vision-language models, we design view prompts to explicitly encode perspective shifts into feature representations, enhancing adaptability to UAV-induced viewpoint changes. Extensive experiments on benchmark datasets demonstrate state-of-the-art performance, with ablation studies validating the efficacy of each component. Beyond technical advancements, this work highlights the potential of VLM-derived semantic alignment for UAV applications, offering insights for future research in real-time UAV-based surveillance systems.

摘要

无人机(UAVs)已成为监视不可或缺的工具,这得益于它们在动态环境中捕获多视角图像的能力。在基于无人机的关键任务中,跨平台人员搜索(即在分布式摄像头网络中检测和识别个体)带来了独特的挑战。无人机捕获的数据中存在严重的视角变化、遮挡和杂乱背景,这会降低传统判别模型的性能,这些模型在这种几何和语义差异下难以保持鲁棒性。为了解决这个问题,我们提出了视角不变人员搜索(VIPS),这是一个将Faster R-CNN与视角不变重新识别(VIReID)模块相结合的新颖两阶段框架。与传统判别模型不同,VIPS利用大型视觉语言模型(VLM)的语义灵活性,并采用两阶段训练策略来解耦和对齐基于文本的身份描述符和视觉特征,通过共享语义嵌入实现强大的跨视角匹配。为了减轻无人机捕获的遮挡和杂乱背景产生的噪声,我们引入了一个用于特征净化的可学习掩码生成器。此外,借鉴视觉语言模型,我们设计了视角提示,以将视角转换明确编码到特征表示中,增强对无人机引起的视角变化的适应性。在基准数据集上进行的大量实验证明了其具有领先的性能,消融研究验证了每个组件的有效性。除了技术进步之外,这项工作还突出了基于VLM的语义对齐在无人机应用中的潜力,为基于无人机的实时监视系统的未来研究提供了见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e1/12431201/ed6f37c5bec8/sensors-25-05362-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e1/12431201/f8ca915a6c47/sensors-25-05362-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e1/12431201/aa379e220f67/sensors-25-05362-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e1/12431201/4d3372a4f5e7/sensors-25-05362-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e1/12431201/958c8e0c9ad5/sensors-25-05362-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e1/12431201/65297fe76328/sensors-25-05362-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e1/12431201/ed6f37c5bec8/sensors-25-05362-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e1/12431201/f8ca915a6c47/sensors-25-05362-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e1/12431201/aa379e220f67/sensors-25-05362-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e1/12431201/4d3372a4f5e7/sensors-25-05362-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e1/12431201/958c8e0c9ad5/sensors-25-05362-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e1/12431201/65297fe76328/sensors-25-05362-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e1/12431201/ed6f37c5bec8/sensors-25-05362-g006.jpg

相似文献

1
VIPS: Learning-View-Invariant Feature for Person Search.VIPS:用于行人搜索的学习视图不变特征
Sensors (Basel). 2025 Aug 29;25(17):5362. doi: 10.3390/s25175362.
2
Integrated neural network framework for multi-object detection and recognition using UAV imagery.用于使用无人机图像进行多目标检测与识别的集成神经网络框架。
Front Neurorobot. 2025 Jul 30;19:1643011. doi: 10.3389/fnbot.2025.1643011. eCollection 2025.
3
Convolutional transform learning based fusion framework for scale invariant long term target detection and tracking in unmanned aerial vehicles.基于卷积变换学习的融合框架,用于无人机中尺度不变的长期目标检测与跟踪。
Sci Rep. 2025 Aug 2;15(1):28248. doi: 10.1038/s41598-025-09652-1.
4
UAV-DETR: An Enhanced RT-DETR Architecture for Efficient Small Object Detection in UAV Imagery.无人机检测Transformer(UAV-DETR):一种用于无人机图像中高效小目标检测的增强型实时检测Transformer(RT-DETR)架构。
Sensors (Basel). 2025 Jul 24;25(15):4582. doi: 10.3390/s25154582.
5
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
6
Mixture of prompts learning for vision-language models.用于视觉语言模型的提示混合学习。
Front Artif Intell. 2025 Jun 10;8:1580973. doi: 10.3389/frai.2025.1580973. eCollection 2025.
7
An Efficient Algorithm for Small Livestock Object Detection in Unmanned Aerial Vehicle Imagery.一种用于无人机图像中小牲畜目标检测的高效算法。
Animals (Basel). 2025 Jun 18;15(12):1794. doi: 10.3390/ani15121794.
8
MCFA: Multi-Scale Cascade and Feature Adaptive Alignment Network for Cross-View Geo-Localization.MCFA:用于跨视图地理定位的多尺度级联与特征自适应对齐网络
Sensors (Basel). 2025 Jul 21;25(14):4519. doi: 10.3390/s25144519.
9
An efficient privacy-preserving multilevel fusion-based feature engineering framework for UAV-enabled land cover classification using remote sensing images.一种基于高效隐私保护多级融合的特征工程框架,用于利用遥感图像进行无人机支持的土地覆盖分类。
Sci Rep. 2025 Jul 3;15(1):23707. doi: 10.1038/s41598-025-08930-2.
10
Video Instance Segmentation Through Hierarchical Offset Compensation and Temporal Memory Update for UAV Aerial Images.基于层次偏移补偿和时间记忆更新的无人机航空影像视频实例分割
Sensors (Basel). 2025 Jul 9;25(14):4274. doi: 10.3390/s25144274.

本文引用的文献

1
SPL-PlaneTR: Lightweight and Generalizable Indoor Plane Segmentation Based on Prompt Learning.SPL-PlaneTR:基于提示学习的轻量级可推广室内平面分割
Sensors (Basel). 2025 Apr 29;25(9):2797. doi: 10.3390/s25092797.
2
Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation.通过基于PConv的微调与自动提示器利用视觉基础模型进行缺陷分割。
Sensors (Basel). 2025 Apr 11;25(8):2417. doi: 10.3390/s25082417.
3
Few-Shot Image Classification of Crop Diseases Based on Vision-Language Models.
基于视觉-语言模型的作物病害少样本图像分类。
Sensors (Basel). 2024 Sep 21;24(18):6109. doi: 10.3390/s24186109.
4
AAformer: Auto-Aligned Transformer for Person Re-Identification.AAformer:用于行人重识别的自动对齐Transformer。
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17307-17317. doi: 10.1109/TNNLS.2023.3301856. Epub 2024 Dec 2.
5
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN:基于区域建议网络的实时目标检测。
IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.