塑造行为的自适应算法。

Adaptive algorithms for shaping behavior.

作者信息

Tong William L, Iyer Anisha, Murthy Venkatesh N, Reddy Gautam

机构信息

School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA.

University of California Berkeley, CA, USA.

出版信息

bioRxiv. 2023 Dec 5:2023.12.03.569774. doi: 10.1101/2023.12.03.569774.

DOI:10.1101/2023.12.03.569774

PMID:38106232

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10723287/

Abstract

Dogs and laboratory mice are commonly trained to perform complex tasks by guiding them through a curriculum of simpler tasks ('shaping'). What are the principles behind effective shaping strategies? Here, we propose a machine learning framework for shaping animal behavior, where an autonomous teacher agent decides its student's task based on the student's transcript of successes and failures on previously assigned tasks. Using autonomous teachers that plan a curriculum in a common sequence learning task, we show that near-optimal shaping algorithms adaptively alternate between simpler and harder tasks to carefully balance reinforcement and extinction. Based on this intuition, we derive an adaptive shaping heuristic with minimal parameters, which we show is near-optimal on the sequence learning task and robustly trains deep reinforcement learning agents on navigation tasks that involve sparse, delayed rewards. Extensions to continuous curricula are explored. Our work provides a starting point towards a general computational framework for shaping animal behavior.

摘要

狗和实验小鼠通常通过引导它们完成一系列更简单的任务（“塑造”）来训练它们执行复杂任务。有效塑造策略背后的原则是什么？在这里，我们提出了一个用于塑造动物行为的机器学习框架，其中一个自主教师代理根据学生在先前分配任务上的成功和失败记录来决定其任务。使用在常见序列学习任务中规划课程的自主教师，我们表明接近最优的塑造算法在更简单和更难的任务之间自适应地交替，以仔细平衡强化和消退。基于这种直觉，我们推导出了一种参数最少的自适应塑造启发式方法，我们表明它在序列学习任务上接近最优，并且能够在涉及稀疏、延迟奖励的导航任务上稳健地训练深度强化学习代理。我们还探索了对连续课程的扩展。我们的工作为塑造动物行为的通用计算框架提供了一个起点。