Barros Pablo, Eppe Manfred, Parisi German I, Liu Xun, Wermter Stefan
Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, Germany.
Department of Psychology, University of CAS, Beijing, China.
Front Robot AI. 2019 Dec 11;6:137. doi: 10.3389/frobt.2019.00137. eCollection 2019.
Expectation learning is a unsupervised learning process which uses multisensory bindings to enhance unisensory perception. For instance, as humans, we learn to associate a barking sound with the visual appearance of a dog, and we continuously fine-tune this association over time, as we learn, e.g., to associate high-pitched barking with small dogs. In this work, we address the problem of developing a computational model that addresses important properties of expectation learning, in particular focusing on the lack of explicit external supervision other than temporal co-occurrence. To this end, we present a novel hybrid neural model based on audio-visual autoencoders and a recurrent self-organizing network for multisensory bindings that facilitate stimulus reconstructions across different sensory modalities. We refer to this mechanism as stimulus prediction across modalities and demonstrate that the proposed model is capable of learning concept bindings by evaluating it on unisensory classification tasks for audio-visual stimuli using the 43,500 Youtube videos from the animal subset of the AudioSet corpus.
期望学习是一种无监督学习过程,它利用多感官绑定来增强单感官感知。例如,作为人类,我们学会将犬吠声与狗的视觉外观联系起来,并且随着时间的推移,我们不断微调这种关联,比如我们学会将高音调的吠声与小狗联系起来。在这项工作中,我们解决了开发一种计算模型的问题,该模型解决期望学习的重要特性,特别关注除了时间共现之外缺乏明确的外部监督。为此,我们提出了一种基于视听自动编码器和用于多感官绑定的循环自组织网络的新型混合神经模型,该模型有助于跨不同感官模态进行刺激重建。我们将这种机制称为跨模态刺激预测,并通过使用来自AudioSet语料库动物子集的43500个YouTube视频对视听刺激的单感官分类任务进行评估,证明所提出的模型能够学习概念绑定。