Saito Hiroshi, Katahira Kentaro, Okanoya Kazuo, Okada Masato
Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.
Phys Rev E Stat Nonlin Soft Matter Phys. 2011 May;83(5 Pt 1):051125. doi: 10.1103/PhysRevE.83.051125. Epub 2011 May 20.
Neural networks can learn flexible input-output associations by changing their synaptic weights. The representational performance and learning dynamics of neural networks are intensively studied in several fields. Neural networks face the "credit assignment problem" in situations in which only incomplete performance evaluations are available. The credit assignment problem is that a network should assign credit or blame for its behaviors according to the contribution to the network performance. In reinforcement learning, a scalar evaluation signal is delivered to a network. The two main types of credit assignment problems in reinforcement learning are structural and temporal, that is, which parameters of the network (structural) and which past network activities (temporal) are related to an evaluation signal given from an environment. In this study, we apply statistical mechanical analysis to the learning processes in a simple neural network model to clarify the effects of two kinds of credit assignments and their interactions. Our model is based on node perturbation learning with eligibility trace. Node perturbation is a stochastic gradient learning method that can solve structural credit assignment problems by introducing a perturbation into the system output. The eligibility trace preserves the past network activities with a temporal credit to deal with the delay of an instruction signal. We show that both credit assignment effects mutually interact and the optimal time constant of the eligibility trace varies not only for the evaluation delay but also the network size.
神经网络可以通过改变突触权重来学习灵活的输入-输出关联。神经网络的表征性能和学习动态在多个领域得到了深入研究。在只有不完整性能评估可用的情况下,神经网络面临“信用分配问题”。信用分配问题是指网络应根据对网络性能的贡献为其行为分配功劳或责任。在强化学习中,一个标量评估信号被传递给网络。强化学习中信用分配问题的两种主要类型是结构型和时间型,即网络的哪些参数(结构型)以及哪些过去的网络活动(时间型)与从环境给出的评估信号相关。在本研究中,我们将统计力学分析应用于一个简单神经网络模型的学习过程,以阐明两种信用分配的影响及其相互作用。我们的模型基于带有资格迹的节点扰动学习。节点扰动是一种随机梯度学习方法,它可以通过向系统输出引入扰动来解决结构信用分配问题。资格迹以时间信用保留过去的网络活动,以处理指令信号的延迟。我们表明,两种信用分配效应相互作用,并且资格迹的最优时间常数不仅随评估延迟变化,还随网络大小变化。