Suppr超能文献

基于示例的查询:使用基于大语言模型的场景图表示进行语义交通场景检索

Query by Example: Semantic Traffic Scene Retrieval Using LLM-Based Scene Graph Representation.

作者信息

Tian Yafu, Carballo Alexander, Li Ruifeng, Thompson Simon, Takeda Kazuya

机构信息

Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan.

State Key Laboratory of Robotic and Intelligent System, Harbin Institute of Technology, Harbin 150000, China.

出版信息

Sensors (Basel). 2025 Apr 17;25(8):2546. doi: 10.3390/s25082546.

Abstract

In autonomous driving, retrieving a specific traffic scene in huge datasets is a significant challenge. Traditional scene retrieval methods struggle to cope with the semantic complexity and heterogeneity of traffic scenes and are unable to meet the variable needs of different users. This paper proposes "Query-by-Example", a traffic scene retrieval approach based on Visual-Large Language Model (VLM)-generated Road Scene Graph (RSG) representation. Our method uses VLMs to generate structured scene graphs from video data, capturing high-level semantic attributes and detailed object relationships in traffic scenes. We introduce an extensible set of scene attributes and a graph-based scene description to quantify scene similarity. We also propose a RSG-LLM benchmark dataset containing 1000 traffic scenes, their corresponding natural language descriptions, and RSGs to evaluate the performance of LLMs in generating RSGs. Experiments show that our method can effectively retrieve semantically similar traffic scenes from large databases, supporting various query formats, including natural language, images, video clips, rosbag, etc. Our method provides a comprehensive and flexible framework for traffic scene retrieval, promoting its application in autonomous driving systems.

摘要

在自动驾驶中,从海量数据集中检索特定交通场景是一项重大挑战。传统的场景检索方法难以应对交通场景的语义复杂性和异质性,无法满足不同用户的多样化需求。本文提出了“示例查询”,这是一种基于视觉大语言模型(VLM)生成的道路场景图(RSG)表示的交通场景检索方法。我们的方法使用VLM从视频数据生成结构化场景图,捕捉交通场景中的高级语义属性和详细对象关系。我们引入了一组可扩展的场景属性和基于图的场景描述来量化场景相似度。我们还提出了一个包含1000个交通场景、其相应自然语言描述和RSG的RSG-LLM基准数据集,以评估大语言模型在生成RSG方面的性能。实验表明,我们的方法能够从大型数据库中有效检索语义相似的交通场景,支持包括自然语言、图像、视频片段、rosbag等各种查询格式。我们的方法为交通场景检索提供了一个全面且灵活的框架,推动其在自动驾驶系统中的应用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验