Suppr超能文献

一种结合深度神经网络的动作解码框架,用于从诱发脑电活动预测视频中人类动作的语义。

An action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activities.

作者信息

Zhang Yuanyuan, Tian Manli, Liu Baolin

机构信息

School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China.

出版信息

Front Neuroinform. 2025 Feb 19;19:1526259. doi: 10.3389/fninf.2025.1526259. eCollection 2025.

Abstract

INTRODUCTION

Recently, numerous studies have focused on the semantic decoding of perceived images based on functional magnetic resonance imaging (fMRI) activities. However, it remains unclear whether it is possible to establish relationships between brain activities and semantic features of human actions in video stimuli. Here we construct a framework for decoding action semantics by establishing relationships between brain activities and semantic features of human actions.

METHODS

To effectively use a small amount of available brain activity data, our proposed method employs a pre-trained image action recognition network model based on an expanding three-dimensional (X3D) deep neural network framework (DNN). To apply brain activities to the image action recognition network, we train regression models that learn the relationship between brain activities and deep-layer image features. To improve decoding accuracy, we join by adding the nonlocal-attention mechanism module to the X3D model to capture long-range temporal and spatial dependence, proposing a multilayer perceptron (MLP) module of multi-task loss constraint to build a more accurate regression mapping approach and performing data enhancement through linear interpolation to expand the amount of data to reduce the impact of a small sample.

RESULTS AND DISCUSSION

Our findings indicate that the features in the X3D-DNN are biologically relevant, and capture information useful for perception. The proposed method enriches the semantic decoding model. We have also conducted several experiments with data from different subsets of brain regions known to process visual stimuli. The results suggest that semantic information for human actions is widespread across the entire visual cortex.

摘要

引言

最近,许多研究都聚焦于基于功能磁共振成像(fMRI)活动对感知图像进行语义解码。然而,尚不清楚是否能够在视频刺激中建立大脑活动与人类动作语义特征之间的关系。在此,我们通过建立大脑活动与人类动作语义特征之间的关系,构建了一个用于解码动作语义的框架。

方法

为了有效利用少量可用的大脑活动数据,我们提出的方法采用了基于扩展三维(X3D)深度神经网络框架(DNN)的预训练图像动作识别网络模型。为了将大脑活动应用于图像动作识别网络,我们训练回归模型来学习大脑活动与深层图像特征之间的关系。为了提高解码精度,我们通过在X3D模型中添加非局部注意力机制模块来捕捉长程时空依赖性,提出多任务损失约束的多层感知器(MLP)模块以构建更准确的回归映射方法,并通过线性插值进行数据增强以扩大数据量,从而减少小样本的影响。

结果与讨论

我们的研究结果表明,X3D-DNN中的特征具有生物学相关性,并捕获了对感知有用的信息。所提出的方法丰富了语义解码模型。我们还使用来自已知处理视觉刺激的不同脑区子集的数据进行了多项实验。结果表明,人类动作的语义信息广泛分布于整个视觉皮层。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f48/11880012/6094763efaa5/fninf-19-1526259-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验