Liu Qiming, Wang Guangzhan, Liu Zhe, Wang Hesheng
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9512-9523. doi: 10.1109/TNNLS.2024.3418857. Epub 2025 May 2.
The fundamental prerequisite for embodied agents to make intelligent decisions lies in autonomous cognition. Typically, agents optimize decision-making by leveraging extensive spatiotemporal information from episodic memory. Concurrently, they utilize long-term experience for task reasoning and foster conscious behavioral tendencies. However, due to the significant disparities in the heterogeneous modalities of these two cognitive abilities, existing literature falls short in designing effective coupling mechanisms, thus failing to endow robots with comprehensive intelligence. This article introduces a navigation framework, the hierarchical topology-semantic cognitive navigation (HTSCN), which seamlessly integrates both memory and reasoning abilities within a singular end-to-end system. Specifically, we represent memory and reasoning abilities with a topological map and a semantic relation graph, respectively, within a unified dual-layer graph structure. Additionally, we incorporate a neural-based cognition extraction process to capture cross-modal relationships between hierarchical graphs. HTSCN forges a link between two different cognitive modalities, thus further enhancing decision-making performance and the overall level of intelligence. Experimental results demonstrate that in comparison to existing cognitive structures, HTSCN significantly enhances the performance and path efficiency of image-goal navigation. Visualization and interpretability experiments further corroborate the promoting role of memory, reasoning, as well as their online learned relationships, on intelligent behavioral patterns. Furthermore, we deploy HTSCN in real-world scenarios to further verify its feasibility and adaptability.
具身智能体做出智能决策的基本前提在于自主认知。通常情况下,智能体通过利用来自情景记忆的广泛时空信息来优化决策。同时,它们利用长期经验进行任务推理并培养有意识的行为倾向。然而,由于这两种认知能力在异构模态上存在显著差异,现有文献在设计有效的耦合机制方面存在不足,从而无法赋予机器人全面的智能。本文介绍了一种导航框架,即分层拓扑 - 语义认知导航(HTSCN),它在一个单一的端到端系统中无缝集成了记忆和推理能力。具体而言,我们在统一的双层图结构中,分别用拓扑地图和语义关系图来表示记忆和推理能力。此外,我们纳入了基于神经网络的认知提取过程,以捕捉分层图之间的跨模态关系。HTSCN在两种不同的认知模态之间建立了联系,从而进一步提高了决策性能和整体智能水平。实验结果表明,与现有认知结构相比,HTSCN显著提高了图像目标导航的性能和路径效率。可视化和可解释性实验进一步证实了记忆、推理以及它们在线学习的关系对智能行为模式的促进作用。此外,我们在实际场景中部署HTSCN以进一步验证其可行性和适应性。