Zhang Hongjie, Deng Hourui, Ou Jie, Feng Chaosheng
College of Computer Science, Sichuan Normal University, Chengdu, 610101, China.
School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China.
Sci Rep. 2025 Mar 14;15(1):8881. doi: 10.1038/s41598-025-93601-5.
Spatial reasoning in Large Language Models (LLMs) serves as a foundation for embodied intelligence. However, even in simple maze environments, LLMs often struggle to plan correct paths due to hallucination issues. To address this, we propose S2ERS, an LLM-based technique that integrates entity and relation extraction with the on-policy reinforcement learning algorithm Sarsa for optimal path planning. We introduce three key improvements: (1) To tackle the hallucination of spatial, we extract a graph structure of entities and relations from the text-based maze description, aiding LLMs in accurately comprehending spatial relationships. (2) To prevent LLMs from getting trapped in dead ends due to context inconsistency hallucination by long-term reasoning, we insert the state-action value function Q into the prompts, guiding the LLM's path planning. (3) To reduce the token consumption of LLMs, we utilize multi-step reasoning, dynamically inserting local Q-tables into the prompt to assist the LLM in outputting multiple steps of actions at once. Our comprehensive experimental evaluation, conducted using closed-source LLMs ChatGPT 3.5, ERNIE-Bot 4.0 and open-source LLM ChatGLM-6B, demonstrates that S2ERS significantly mitigates the spatial hallucination issues in LLMs, and improves the success rate and optimal rate by approximately 29% and 19%, respectively, in comparison to the SOTA CoT methods.
大语言模型(LLMs)中的空间推理是具身智能的基础。然而,即使在简单的迷宫环境中,由于幻觉问题,大语言模型在规划正确路径时也常常遇到困难。为了解决这个问题,我们提出了S2ERS,这是一种基于大语言模型的技术,它将实体和关系提取与基于策略的强化学习算法Sarsa相结合,用于优化路径规划。我们引入了三项关键改进:(1)为了解决空间幻觉问题,我们从基于文本的迷宫描述中提取实体和关系的图结构,帮助大语言模型准确理解空间关系。(2)为了防止大语言模型由于长期推理导致的上下文不一致幻觉而陷入死胡同,我们将状态-动作值函数Q插入到提示中,指导大语言模型的路径规划。(3)为了减少大语言模型的令牌消耗,我们利用多步推理,动态地将局部Q表插入到提示中,以帮助大语言模型一次输出多个步骤的动作。我们使用闭源大语言模型ChatGPT 3.5、ERNIE-Bot 4.0和开源大语言模型ChatGLM-6B进行的综合实验评估表明,S2ERS显著减轻了大语言模型中的空间幻觉问题,与当前最优的思维链(CoT)方法相比,成功率和最优率分别提高了约29%和近19%。