Max Planck Institute of Experimental Medicine, Hermann-Rein-Strasse 3, 37075 Göttingen, Germany.
Science. 2016 Mar 4;351(6277):aab4113. doi: 10.1126/science.aab4113.
The brain routinely discovers sensory clues that predict opportunities or dangers. However, it is unclear how neural learning processes can bridge the typically long delays between sensory clues and behavioral outcomes. Here, I introduce a learning concept, aggregate-label learning, that enables biologically plausible model neurons to solve this temporal credit assignment problem. Aggregate-label learning matches a neuron's number of output spikes to a feedback signal that is proportional to the number of clues but carries no information about their timing. Aggregate-label learning outperforms stochastic reinforcement learning at identifying predictive clues and is able to solve unsegmented speech-recognition tasks. Furthermore, it allows unsupervised neural networks to discover reoccurring constellations of sensory features even when they are widely dispersed across space and time.
大脑通常会发现预测机会或危险的感官线索。然而,目前尚不清楚神经学习过程如何能够弥合感官线索与行为结果之间通常存在的长时间延迟。在这里,我引入了一个学习概念,即聚合标签学习,它使生物上合理的模型神经元能够解决这个时间信用分配问题。聚合标签学习将神经元的输出尖峰数量与反馈信号匹配,该反馈信号与线索数量成正比,但不包含有关其时间的信息。在识别预测线索方面,聚合标签学习优于随机强化学习,并且能够解决未分段的语音识别任务。此外,它允许无监督神经网络发现即使在空间和时间上广泛分散的感官特征的重复模式。