• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DINO-Mix通过基础视觉模型和特征混合增强视觉场所识别。

DINO-Mix enhancing visual place recognition with foundational vision model and feature mixing.

作者信息

Huang Gaoshuang, Zhou Yang, Hu Xiaofei, Zhang Chenglong, Zhao Luying, Gan Wenjian

机构信息

Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou, 450001, China.

出版信息

Sci Rep. 2024 Sep 27;14(1):22100. doi: 10.1038/s41598-024-73853-3.

DOI:10.1038/s41598-024-73853-3
PMID:39333370
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11437288/
Abstract

Using visual place recognition (VPR) technology to ascertain the geographical location of publicly available images is a pressing issue. Although most current VPR methods achieve favorable results under ideal conditions, their performance in complex environments, characterized by lighting variations, seasonal changes, and occlusions, is generally unsatisfactory. Therefore, obtaining efficient and robust image feature descriptors in complex environments is a pressing issue. In this study, we utilized the DINOv2 model as the backbone for trimming and fine-tuning to extract robust image features and employed a feature mix module to aggregate image features, resulting in globally robust and generalizable descriptors that enable high-precision VPR. We experimentally demonstrated that the proposed DINO-Mix outperforms the current state-of-the-art (SOTA) methods. Using test sets having lighting variations, seasonal changes, and occlusions such as Tokyo24/7, Nordland, and SF-XL-Testv1, our proposed architecture achieved Top-1 accuracy rates of 91.75%, 80.18%, and 82%, respectively, and exhibited an average accuracy improvement of 5.14%. In addition, we compared it with other SOTA methods using representative image retrieval case studies, and our architecture outperformed its competitors in terms of VPR performance. Furthermore, we visualized the attention maps of DINO-Mix and other methods to provide a more intuitive understanding of their respective strengths. These visualizations serve as compelling evidence of the superiority of the DINO-Mix framework in this domain.

摘要

使用视觉位置识别(VPR)技术来确定公开可用图像的地理位置是一个紧迫的问题。尽管当前大多数VPR方法在理想条件下能取得良好的结果,但其在以光照变化、季节变化和遮挡为特征的复杂环境中的性能通常不尽人意。因此,在复杂环境中获得高效且鲁棒的图像特征描述符是一个紧迫的问题。在本研究中,我们利用DINOv2模型作为主干进行裁剪和微调以提取鲁棒的图像特征,并采用特征混合模块来聚合图像特征,从而得到全局鲁棒且可推广的描述符,实现高精度的VPR。我们通过实验证明,所提出的DINO-Mix优于当前的最先进(SOTA)方法。使用具有光照变化、季节变化和遮挡的测试集,如Tokyo24/7、Nordland和SF-XL-Testv1,我们提出的架构分别实现了91.75%、80.18%和82%的Top-1准确率,并且平均准确率提高了5.14%。此外,我们在代表性的图像检索案例研究中,将其与其他SOTA方法进行比较,我们的架构在VPR性能方面优于竞争对手。此外,我们对DINO-Mix和其他方法的注意力图进行了可视化,以便更直观地了解它们各自的优势。这些可视化结果有力地证明了DINO-Mix框架在该领域的优越性。

相似文献

1
DINO-Mix enhancing visual place recognition with foundational vision model and feature mixing.DINO-Mix通过基础视觉模型和特征混合增强视觉场所识别。
Sci Rep. 2024 Sep 27;14(1):22100. doi: 10.1038/s41598-024-73853-3.
2
Convolutional MLP orthogonal fusion of multiscale features for visual place recognition.用于视觉场所识别的多尺度特征卷积MLP正交融合
Sci Rep. 2024 May 23;14(1):11756. doi: 10.1038/s41598-024-62749-x.
3
An Appearance-Semantic Descriptor with Coarse-to-Fine Matching for Robust VPR.一种具有从粗到细匹配的外观语义描述符用于鲁棒视觉位置识别
Sensors (Basel). 2024 Mar 29;24(7):2203. doi: 10.3390/s24072203.
4
Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery.Surgical-DINO:内窥镜手术中深度估计的基础模型适配器学习。
Int J Comput Assist Radiol Surg. 2024 Jun;19(6):1013-1020. doi: 10.1007/s11548-024-03083-5. Epub 2024 Mar 8.
5
Contextual Patch-NetVLAD: Context-Aware Patch Feature Descriptor and Patch Matching Mechanism for Visual Place Recognition.上下文补丁网络局部聚合描述符:用于视觉场所识别的上下文感知补丁特征描述符和补丁匹配机制
Sensors (Basel). 2024 Jan 28;24(3):855. doi: 10.3390/s24030855.
6
SVS-VPR: A Semantic Visual and Spatial Information-Based Hierarchical Visual Place Recognition for Autonomous Navigation in Challenging Environmental Conditions.SVS-VPR:一种基于语义视觉和空间信息的分层视觉场所识别方法,用于在具有挑战性的环境条件下进行自主导航。
Sensors (Basel). 2024 Jan 30;24(3):906. doi: 10.3390/s24030906.
7
Foundation models in gastrointestinal endoscopic AI: Impact of architecture, pre-training approach and data efficiency.胃肠道内镜 AI 中的基础模型:架构、预训练方法和数据效率的影响。
Med Image Anal. 2024 Dec;98:103298. doi: 10.1016/j.media.2024.103298. Epub 2024 Aug 12.
8
Image partitioning and illumination in image-based pose detection for teleoperated flexible endoscopes.基于图像的远程操作柔性内窥镜位姿检测中的图像分区和光照。
Artif Intell Med. 2013 Nov;59(3):185-96. doi: 10.1016/j.artmed.2013.09.002. Epub 2013 Oct 10.
9
Myo-regressor Deep Informed Neural NetwOrk (Myo-DINO) for fast MR parameters mapping in neuromuscular disorders.肌电回归深度信息神经网络(Myo-DINO)在神经肌肉疾病中快速磁共振参数映射。
Comput Methods Programs Biomed. 2024 Nov;256:108399. doi: 10.1016/j.cmpb.2024.108399. Epub 2024 Aug 28.
10
Distributed training of CosPlace for large-scale visual place recognition.用于大规模视觉场所识别的CosPlace分布式训练。
Front Robot AI. 2024 May 20;11:1386464. doi: 10.3389/frobt.2024.1386464. eCollection 2024.

引用本文的文献

1
Explainable self-supervised learning for medical image diagnosis based on DINO V2 model and semantic search.基于DINO V2模型和语义搜索的可解释自监督医学图像诊断学习
Sci Rep. 2025 Sep 1;15(1):32174. doi: 10.1038/s41598-025-15604-6.

本文引用的文献

1
Rich learning representations for human activity recognition: How to empower deep feature learning for biological time series.用于人类活动识别的丰富学习表示:如何为生物时间序列提供强大的深度特征学习能力。
J Biomed Inform. 2022 Oct;134:104180. doi: 10.1016/j.jbi.2022.104180. Epub 2022 Aug 27.
2
Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition.用于地点识别的带加权三元组损失的空间金字塔增强NetVLAD
IEEE Trans Neural Netw Learn Syst. 2020 Feb;31(2):661-674. doi: 10.1109/TNNLS.2019.2908982. Epub 2019 Apr 26.
3
Fine-Tuning CNN Image Retrieval with No Human Annotation.
无人工标注微调卷积神经网络图像检索。
IEEE Trans Pattern Anal Mach Intell. 2019 Jul;41(7):1655-1668. doi: 10.1109/TPAMI.2018.2846566. Epub 2018 Jun 12.
4
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition.NetVLAD:用于弱监督场景识别的卷积神经网络架构。
IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1437-1451. doi: 10.1109/TPAMI.2017.2711011. Epub 2017 Jun 1.
5
24/7 Place Recognition by View Synthesis.通过视图合成实现 24/7 地点识别。
IEEE Trans Pattern Anal Mach Intell. 2018 Feb;40(2):257-271. doi: 10.1109/TPAMI.2017.2667665. Epub 2017 Feb 13.
6
Visual place recognition with repetitive structures.基于重复结构的视觉位置识别。
IEEE Trans Pattern Anal Mach Intell. 2015 Nov;37(11):2346-59. doi: 10.1109/TPAMI.2015.2409868.
7
Aggregating local image descriptors into compact codes.将局部图像描述符聚合到紧凑代码中。
IEEE Trans Pattern Anal Mach Intell. 2012 Sep;34(9):1704-16. doi: 10.1109/TPAMI.2011.235.