通过多模态大语言模型整合视觉与嗅觉以实现机器人气味源定位

Integrating Vision and Olfaction via Multi-Modal LLM for Robotic Odor Source Localization.

作者信息

Hassan Sunzid, Wang Lingxiao, Mahmud Khan Raqib

机构信息

Department of Computer Science, Louisiana Tech University, 201 Mayfield Ave, Ruston, LA 71272, USA.

Department of Electrical Engineering, Louisiana Tech University, 201 Mayfield Ave, Ruston, LA 71272, USA.

出版信息

Sensors (Basel). 2024 Dec 10;24(24):7875. doi: 10.3390/s24247875.

DOI:10.3390/s24247875

PMID:39771614

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11678985/

Abstract

Odor source localization (OSL) technology allows autonomous agents like mobile robots to localize a target odor source in an unknown environment. This is achieved by an OSL navigation algorithm that processes an agent's sensor readings to calculate action commands to guide the robot to locate the odor source. Compared to traditional 'olfaction-only' OSL algorithms, our proposed OSL algorithm integrates vision and olfaction sensor modalities to localize odor sources even if olfaction sensing is disrupted by non-unidirectional airflow or vision sensing is impaired by environmental complexities. The algorithm leverages the zero-shot multi-modal reasoning capabilities of large language models (LLMs), negating the requirement of manual knowledge encoding or custom-trained supervised learning models. A key feature of the proposed algorithm is the 'High-level Reasoning' module, which encodes the olfaction and vision sensor data into a multi-modal prompt and instructs the LLM to employ a hierarchical reasoning process to select an appropriate high-level navigation behavior. Subsequently, the 'Low-level Action' module translates the selected high-level navigation behavior into low-level action commands that can be executed by the mobile robot. To validate our algorithm, we implemented it on a mobile robot in a real-world environment with non-unidirectional airflow environments and obstacles to mimic a complex, practical search environment. We compared the performance of our proposed algorithm to single-sensory-modality-based 'olfaction-only' and 'vision-only' navigation algorithms, and a supervised learning-based 'vision and olfaction fusion' (Fusion) navigation algorithm. The experimental results show that the proposed LLM-based algorithm outperformed the other algorithms in terms of success rates and average search times in both unidirectional and non-unidirectional airflow environments.

摘要

气味源定位（OSL）技术使移动机器人等自主智能体能够在未知环境中定位目标气味源。这是通过一种OSL导航算法实现的，该算法处理智能体的传感器读数，以计算行动指令，引导机器人找到气味源。与传统的“仅嗅觉”OSL算法相比，我们提出的OSL算法集成了视觉和嗅觉传感器模式，即使嗅觉传感受到非单向气流的干扰，或者视觉传感因环境复杂性而受损，也能定位气味源。该算法利用了大语言模型（LLMs）的零样本多模态推理能力，无需手动知识编码或定制训练的监督学习模型。所提出算法的一个关键特性是“高级推理”模块，它将嗅觉和视觉传感器数据编码为多模态提示，并指示大语言模型采用分层推理过程来选择合适的高级导航行为。随后，“低级行动”模块将选定的高级导航行为转换为移动机器人可以执行的低级行动指令。为了验证我们的算法，我们在具有非单向气流环境和障碍物的真实世界环境中的移动机器人上实现了该算法，以模拟复杂的实际搜索环境。我们将我们提出的算法的性能与基于单感官模式的“仅嗅觉”和“仅视觉”导航算法，以及基于监督学习的“视觉和嗅觉融合”（融合）导航算法进行了比较。实验结果表明，在单向和非单向气流环境中，所提出的基于大语言模型的算法在成功率和平均搜索时间方面均优于其他算法。