Yu Haitao, Zhao Quanfa
School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072 China.
Cogn Neurodyn. 2024 Dec;18(6):3615-3628. doi: 10.1007/s11571-023-09932-4. Epub 2023 Feb 2.
The integration and interaction of cross-modal senses in brain neural networks can facilitate high-level cognitive functionalities. In this work, we proposed a bioinspired multisensory integration neural network (MINN) that integrates visual and audio senses for recognizing multimodal information across different sensory modalities. This deep learning-based model incorporates a cascading framework of parallel convolutional neural networks (CNNs) for extracting intrinsic features from visual and audio inputs, and a recurrent neural network (RNN) for multimodal information integration and interaction. The network was trained using synthetic training data generated for digital recognition tasks. It was revealed that the spatial and temporal features extracted from visual and audio inputs by CNNs were encoded in subspaces orthogonal with each other. In integration epoch, network state evolved along quasi-rotation-symmetric trajectories and a structural manifold with stable attractors was formed in RNN, supporting accurate cross-modal recognition. We further evaluated the robustness of the MINN algorithm with noisy inputs and asynchronous digital inputs. Experimental results demonstrated the superior performance of MINN for flexible integration and accurate recognition of multisensory information with distinct sense properties. The present results provide insights into the computational principles governing multisensory integration and a comprehensive neural network model for brain-inspired intelligence.
大脑神经网络中跨模态感官的整合与交互能够促进高级认知功能。在这项工作中,我们提出了一种受生物启发的多感官整合神经网络(MINN),它整合视觉和听觉感官,以识别跨不同感官模态的多模态信息。这个基于深度学习的模型包含一个并行卷积神经网络(CNN)的级联框架,用于从视觉和音频输入中提取内在特征,以及一个循环神经网络(RNN)用于多模态信息的整合与交互。该网络使用为数字识别任务生成的合成训练数据进行训练。结果表明,CNN从视觉和音频输入中提取的空间和时间特征在相互正交的子空间中进行编码。在整合阶段,网络状态沿着准旋转对称轨迹演化,并且在RNN中形成了具有稳定吸引子的结构流形,支持准确的跨模态识别。我们进一步评估了MINN算法在有噪声输入和异步数字输入情况下的鲁棒性。实验结果证明了MINN在灵活整合和准确识别具有不同感官特性的多感官信息方面的卓越性能。目前的结果为多感官整合的计算原理提供了见解,并为受大脑启发的智能提供了一个全面的神经网络模型。