基于时间记忆的Q学习以应对动荡。

Q-learning with temporal memory to navigate turbulence.

作者信息

Rando Marco, James Martin, Verri Alessandro, Rosasco Lorenzo, Seminara Agnese

机构信息

MaLGa, Department of Computer Science, Bioengineering, Robotics and Systems Engineering, University of Genova, Genoa, Italy.

MalGa, Department of Civil, Chemical and Environmental Engineering, University of Genoa, Genova, Italy.

出版信息

Elife. 2025 Jul 21;13:RP102906. doi: 10.7554/eLife.102906.

DOI:10.7554/eLife.102906

PMID:40690282

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12279376/

Abstract

We consider the problem of olfactory searches in a turbulent environment. We focus on agents that respond solely to odor stimuli, with no access to spatial perception nor prior information about the odor. We ask whether navigation to a target can be learned robustly within a sequential decision making framework. We develop a reinforcement learning algorithm using a small set of interpretable olfactory states and train it with realistic turbulent odor cues. By introducing a temporal memory, we demonstrate that two salient features of odor traces, discretized in a few olfactory states, are sufficient to learn navigation in a realistic odor plume. Performance is dictated by the sparse nature of turbulent odors. An optimal memory exists which ignores blanks within the plume and activates a recovery strategy outside the plume. We obtain the best performance by letting agents learn their recovery strategy and show that it is mostly casting cross wind, similar to behavior observed in flying insects. The optimal strategy is robust to substantial changes in the odor plumes, suggesting minor parameter tuning may be sufficient to adapt to different environments.

摘要

我们考虑在湍流环境中进行嗅觉搜索的问题。我们关注的是仅对气味刺激做出反应的主体，它们无法获取空间感知信息，也没有关于气味的先验信息。我们探讨在序列决策框架内是否能够稳健地学习到朝向目标的导航。我们使用一小部分可解释的嗅觉状态开发了一种强化学习算法，并用逼真的湍流气味线索对其进行训练。通过引入时间记忆，我们证明，在少数嗅觉状态下离散化的气味痕迹的两个显著特征足以在逼真的气味羽流中学习导航。性能由湍流气味的稀疏特性决定。存在一种最优记忆，它忽略羽流中的空白区域，并在羽流外激活一种恢复策略。通过让主体学习其恢复策略，我们获得了最佳性能，并表明其主要是逆风飞行，类似于在飞行昆虫中观察到的行为。最优策略对气味羽流的大幅变化具有鲁棒性，这表明进行微小的参数调整可能就足以适应不同的环境。