Sejnowski Terrence J, Poizner Howard, Lynch Gary, Gepshtein Sergei, Greenspan Ralph J
Howard Hughes Medical Institute, Salk Institute for Biological Sciences, La Jolla, CA 92037 USA and the Division of Biological Studies, University of California at San Diego, La Jolla, CA 92093 USA
Institute for Neural Computation, University of California at San Diego, La Jolla, CA 92093-0523 USA (
Proc IEEE Inst Electr Electron Eng. 2014 May;102(5). doi: 10.1109/JPROC.2014.2314297.
Human performance approaches that of an ideal observer and optimal actor in some perceptual and motor tasks. These optimal abilities depend on the capacity of the cerebral cortex to store an immense amount of information and to flexibly make rapid decisions. However, behavior only approaches these limits after a long period of learning while the cerebral cortex interacts with the basal ganglia, an ancient part of the vertebrate brain that is responsible for learning sequences of actions directed toward achieving goals. Progress has been made in understanding the algorithms used by the brain during reinforcement learning, which is an online approximation of dynamic programming. Humans also make plans that depend on past experience by simulating different scenarios, which is called . The same brain structures in the cortex and basal ganglia that are active online during optimal behavior are also active offline during prospective optimization. The emergence of general principles and algorithms for goal-directed behavior has consequences for the development of autonomous devices in engineering applications.
在某些感知和运动任务中,人类的表现接近理想观察者和最佳执行者。这些最佳能力取决于大脑皮层存储大量信息并灵活快速做出决策的能力。然而,行为只有在经过长时间学习后才接近这些极限,在此期间大脑皮层与基底神经节相互作用,基底神经节是脊椎动物大脑的一个古老部分,负责学习为实现目标而采取的行动序列。在理解大脑在强化学习过程中使用的算法方面已经取得了进展,强化学习是动态规划的一种在线近似。人类还通过模拟不同场景来制定依赖于过去经验的计划,这被称为 。在最佳行为期间在线活跃的皮层和基底神经节中的相同脑结构,在预期优化期间离线时也会活跃。目标导向行为的一般原则和算法的出现对工程应用中自主设备的开发具有影响。