Zhang Yuanyuan, Tian Manli, Liu Baolin
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China.
Front Neuroinform. 2025 Feb 19;19:1526259. doi: 10.3389/fninf.2025.1526259. eCollection 2025.
Recently, numerous studies have focused on the semantic decoding of perceived images based on functional magnetic resonance imaging (fMRI) activities. However, it remains unclear whether it is possible to establish relationships between brain activities and semantic features of human actions in video stimuli. Here we construct a framework for decoding action semantics by establishing relationships between brain activities and semantic features of human actions.
To effectively use a small amount of available brain activity data, our proposed method employs a pre-trained image action recognition network model based on an expanding three-dimensional (X3D) deep neural network framework (DNN). To apply brain activities to the image action recognition network, we train regression models that learn the relationship between brain activities and deep-layer image features. To improve decoding accuracy, we join by adding the nonlocal-attention mechanism module to the X3D model to capture long-range temporal and spatial dependence, proposing a multilayer perceptron (MLP) module of multi-task loss constraint to build a more accurate regression mapping approach and performing data enhancement through linear interpolation to expand the amount of data to reduce the impact of a small sample.
Our findings indicate that the features in the X3D-DNN are biologically relevant, and capture information useful for perception. The proposed method enriches the semantic decoding model. We have also conducted several experiments with data from different subsets of brain regions known to process visual stimuli. The results suggest that semantic information for human actions is widespread across the entire visual cortex.
最近,许多研究都聚焦于基于功能磁共振成像(fMRI)活动对感知图像进行语义解码。然而,尚不清楚是否能够在视频刺激中建立大脑活动与人类动作语义特征之间的关系。在此,我们通过建立大脑活动与人类动作语义特征之间的关系,构建了一个用于解码动作语义的框架。
为了有效利用少量可用的大脑活动数据,我们提出的方法采用了基于扩展三维(X3D)深度神经网络框架(DNN)的预训练图像动作识别网络模型。为了将大脑活动应用于图像动作识别网络,我们训练回归模型来学习大脑活动与深层图像特征之间的关系。为了提高解码精度,我们通过在X3D模型中添加非局部注意力机制模块来捕捉长程时空依赖性,提出多任务损失约束的多层感知器(MLP)模块以构建更准确的回归映射方法,并通过线性插值进行数据增强以扩大数据量,从而减少小样本的影响。
我们的研究结果表明,X3D-DNN中的特征具有生物学相关性,并捕获了对感知有用的信息。所提出的方法丰富了语义解码模型。我们还使用来自已知处理视觉刺激的不同脑区子集的数据进行了多项实验。结果表明,人类动作的语义信息广泛分布于整个视觉皮层。