Suppr超能文献

基于CLIP的行人重识别中领域差距的研究

An Investigation of the Domain Gap in CLIP-Based Person Re-Identification.

作者信息

Asperti Andrea, Naldi Leonardo, Fiorilla Salvatore

机构信息

Department of Informatics-Science and Engineering (DISI), University of Bologna, 40126 Bologna, Italy.

出版信息

Sensors (Basel). 2025 Jan 9;25(2):363. doi: 10.3390/s25020363.

Abstract

Person re-identification (re-id) is a critical computer vision task aimed at identifying individuals across multiple non-overlapping cameras, with wide-ranging applications in intelligent surveillance systems. Despite recent advances, the domain gap-performance degradation when models encounter unseen datasets-remains a critical challenge. CLIP-based models, leveraging multimodal pre-training, offer potential for mitigating this issue by aligning visual and textual representations. In this study, we provide a comprehensive quantitative analysis of the domain gap in CLIP-based re-id systems across standard benchmarks, including Market-1501, DukeMTMC-reID, MSMT17, and Airport, simulating real-world deployment conditions. We systematically measure the performance of these models in terms of mean average precision (mAP) and Rank-1 accuracy, offering insights into the challenges faced during dataset transitions. Our analysis highlights the specific advantages introduced by CLIP's visual-textual alignment and evaluates its contribution relative to strong image encoder baselines. Additionally, we evaluate the impact of extending training sets with non-domain-specific data and incorporating random erasing augmentation, achieving an average improvement of +4.3% in mAP and +4.0% in Rank-1 accuracy. Our findings underscore the importance of standardized benchmarks and systematic evaluations for enhancing reproducibility and guiding future research. This work contributes to a deeper understanding of the domain gap in re-id, while highlighting pathways for improving model robustness and generalization in diverse, real-world scenarios.

摘要

行人重识别(re-id)是一项关键的计算机视觉任务,旨在通过多个不重叠的摄像头识别个体,在智能监控系统中有广泛应用。尽管最近取得了进展,但当模型遇到未见数据集时的领域差距——性能下降——仍然是一个关键挑战。基于CLIP的模型利用多模态预训练,通过对齐视觉和文本表示,为缓解这一问题提供了潜力。在本研究中,我们对基于CLIP的re-id系统在包括Market-1501、DukeMTMC-reID、MSMT17和Airport等标准基准上的领域差距进行了全面的定量分析,模拟了实际部署条件。我们系统地根据平均精度均值(mAP)和Rank-1准确率来衡量这些模型的性能,深入了解数据集转换过程中面临的挑战。我们的分析突出了CLIP的视觉-文本对齐所带来的特定优势,并评估了其相对于强大的图像编码器基线的贡献。此外,我们评估了用非特定领域数据扩展训练集以及纳入随机擦除增强的影响,在mAP上平均提高了+4.3%,在Rank-1准确率上提高了+4.0%。我们的研究结果强调了标准化基准和系统评估对于提高可重复性和指导未来研究的重要性。这项工作有助于更深入地理解re-id中的领域差距,同时突出了在多样的现实场景中提高模型鲁棒性和泛化能力的途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfa1/11769178/3e75c2a12164/sensors-25-00363-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验