• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于示例的查询:使用基于大语言模型的场景图表示进行语义交通场景检索

Query by Example: Semantic Traffic Scene Retrieval Using LLM-Based Scene Graph Representation.

作者信息

Tian Yafu, Carballo Alexander, Li Ruifeng, Thompson Simon, Takeda Kazuya

机构信息

Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan.

State Key Laboratory of Robotic and Intelligent System, Harbin Institute of Technology, Harbin 150000, China.

出版信息

Sensors (Basel). 2025 Apr 17;25(8):2546. doi: 10.3390/s25082546.

DOI:10.3390/s25082546
PMID:40285243
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12031543/
Abstract

In autonomous driving, retrieving a specific traffic scene in huge datasets is a significant challenge. Traditional scene retrieval methods struggle to cope with the semantic complexity and heterogeneity of traffic scenes and are unable to meet the variable needs of different users. This paper proposes "Query-by-Example", a traffic scene retrieval approach based on Visual-Large Language Model (VLM)-generated Road Scene Graph (RSG) representation. Our method uses VLMs to generate structured scene graphs from video data, capturing high-level semantic attributes and detailed object relationships in traffic scenes. We introduce an extensible set of scene attributes and a graph-based scene description to quantify scene similarity. We also propose a RSG-LLM benchmark dataset containing 1000 traffic scenes, their corresponding natural language descriptions, and RSGs to evaluate the performance of LLMs in generating RSGs. Experiments show that our method can effectively retrieve semantically similar traffic scenes from large databases, supporting various query formats, including natural language, images, video clips, rosbag, etc. Our method provides a comprehensive and flexible framework for traffic scene retrieval, promoting its application in autonomous driving systems.

摘要

在自动驾驶中,从海量数据集中检索特定交通场景是一项重大挑战。传统的场景检索方法难以应对交通场景的语义复杂性和异质性,无法满足不同用户的多样化需求。本文提出了“示例查询”,这是一种基于视觉大语言模型(VLM)生成的道路场景图(RSG)表示的交通场景检索方法。我们的方法使用VLM从视频数据生成结构化场景图,捕捉交通场景中的高级语义属性和详细对象关系。我们引入了一组可扩展的场景属性和基于图的场景描述来量化场景相似度。我们还提出了一个包含1000个交通场景、其相应自然语言描述和RSG的RSG-LLM基准数据集,以评估大语言模型在生成RSG方面的性能。实验表明,我们的方法能够从大型数据库中有效检索语义相似的交通场景,支持包括自然语言、图像、视频片段、rosbag等各种查询格式。我们的方法为交通场景检索提供了一个全面且灵活的框架,推动其在自动驾驶系统中的应用。

相似文献

1
Query by Example: Semantic Traffic Scene Retrieval Using LLM-Based Scene Graph Representation.基于示例的查询:使用基于大语言模型的场景图表示进行语义交通场景检索
Sensors (Basel). 2025 Apr 17;25(8):2546. doi: 10.3390/s25082546.
2
Remote sensing traffic scene retrieval based on learning control algorithm for robot multimodal sensing information fusion and human-machine interaction and collaboration.基于学习控制算法的机器人多模态传感信息融合与人机交互协作的遥感交通场景检索
Front Neurorobot. 2023 Oct 11;17:1267231. doi: 10.3389/fnbot.2023.1267231. eCollection 2023.
3
Image Captioning Based on Semantic Scenes.基于语义场景的图像字幕
Entropy (Basel). 2024 Oct 18;26(10):876. doi: 10.3390/e26100876.
4
A Comprehensive Survey of Scene Graphs: Generation and Application.场景图的全面综述:生成与应用
IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):1-26. doi: 10.1109/TPAMI.2021.3137605. Epub 2022 Dec 5.
5
Semantic-based surveillance video retrieval.基于语义的监控视频检索
IEEE Trans Image Process. 2007 Apr;16(4):1168-81. doi: 10.1109/tip.2006.891352.
6
Knowledge-infused Learning for Entity Prediction in Driving Scenes.驾驶场景中用于实体预测的知识注入学习
Front Big Data. 2021 Nov 25;4:759110. doi: 10.3389/fdata.2021.759110. eCollection 2021.
7
[A retrieval method of drug molecules based on graph collapsing].基于图折叠的药物分子检索方法
Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):368-374.
8
Interactive Visual Pattern Search on Graph Data via Graph Representation Learning.通过图表示学习在图数据上进行交互式视觉模式搜索
IEEE Trans Vis Comput Graph. 2022 Jan;28(1):335-345. doi: 10.1109/TVCG.2021.3114857. Epub 2021 Dec 24.
9
Fine-Grained Video Retrieval With Scene Sketches.基于场景草图的细粒度视频检索。
IEEE Trans Image Process. 2023;32:3136-3149. doi: 10.1109/TIP.2023.3278474. Epub 2023 Jun 2.
10
The Linguistic Analysis of Scene Semantics: LASS.场景语义的语言分析:LASS。
Behav Res Methods. 2020 Dec;52(6):2349-2371. doi: 10.3758/s13428-020-01390-8.

本文引用的文献

1
Road Hazard Stimuli: Annotated naturalistic road videos for studying hazard detection and scene perception.道路危险刺激物:标注自然主义道路视频,用于研究危险检测和场景感知。
Behav Res Methods. 2024 Apr;56(4):4188-4204. doi: 10.3758/s13428-023-02299-8. Epub 2023 Dec 11.
2
Deep Learning for Instance Retrieval: A Survey.深度学习实例检索综述
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7270-7292. doi: 10.1109/TPAMI.2022.3218591. Epub 2023 May 5.
3
Robot Operating System 2: Design, architecture, and uses in the wild.
机器人操作系统2:设计、架构及实际应用
Sci Robot. 2022 May 11;7(66):eabm6074. doi: 10.1126/scirobotics.abm6074.
4
Deep Multi-View Enhancement Hashing for Image Retrieval.用于图像检索的深度多视图增强哈希
IEEE Trans Pattern Anal Mach Intell. 2021 Apr;43(4):1445-1451. doi: 10.1109/TPAMI.2020.2975798. Epub 2021 Mar 4.
5
Rapid holistic perception and evasion of road hazards.快速整体感知与规避路面障碍物。
J Exp Psychol Gen. 2020 Mar;149(3):490-500. doi: 10.1037/xge0000665. Epub 2019 Jul 25.
6
Predicting road scenes from brief views of driving video.从驾驶视频的简短画面预测道路场景。
J Vis. 2019 May 1;19(5):8. doi: 10.1167/19.5.8.
7
A (sub)graph isomorphism algorithm for matching large graphs.一种用于匹配大型图的(子)图同构算法。
IEEE Trans Pattern Anal Mach Intell. 2004 Oct;26(10):1367-72. doi: 10.1109/TPAMI.2004.75.