一种结合深度神经网络的动作解码框架，用于从诱发脑电活动预测视频中人类动作的语义。

An action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activities.

作者信息

Zhang Yuanyuan, Tian Manli, Liu Baolin

机构信息

School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China.

出版信息

Front Neuroinform. 2025 Feb 19;19:1526259. doi: 10.3389/fninf.2025.1526259. eCollection 2025.

DOI:10.3389/fninf.2025.1526259

PMID:40046085

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11880012/

Abstract

INTRODUCTION

Recently, numerous studies have focused on the semantic decoding of perceived images based on functional magnetic resonance imaging (fMRI) activities. However, it remains unclear whether it is possible to establish relationships between brain activities and semantic features of human actions in video stimuli. Here we construct a framework for decoding action semantics by establishing relationships between brain activities and semantic features of human actions.

METHODS

To effectively use a small amount of available brain activity data, our proposed method employs a pre-trained image action recognition network model based on an expanding three-dimensional (X3D) deep neural network framework (DNN). To apply brain activities to the image action recognition network, we train regression models that learn the relationship between brain activities and deep-layer image features. To improve decoding accuracy, we join by adding the nonlocal-attention mechanism module to the X3D model to capture long-range temporal and spatial dependence, proposing a multilayer perceptron (MLP) module of multi-task loss constraint to build a more accurate regression mapping approach and performing data enhancement through linear interpolation to expand the amount of data to reduce the impact of a small sample.

RESULTS AND DISCUSSION

Our findings indicate that the features in the X3D-DNN are biologically relevant, and capture information useful for perception. The proposed method enriches the semantic decoding model. We have also conducted several experiments with data from different subsets of brain regions known to process visual stimuli. The results suggest that semantic information for human actions is widespread across the entire visual cortex.

摘要

引言

最近，许多研究都聚焦于基于功能磁共振成像（fMRI）活动对感知图像进行语义解码。然而，尚不清楚是否能够在视频刺激中建立大脑活动与人类动作语义特征之间的关系。在此，我们通过建立大脑活动与人类动作语义特征之间的关系，构建了一个用于解码动作语义的框架。

方法

为了有效利用少量可用的大脑活动数据，我们提出的方法采用了基于扩展三维（X3D）深度神经网络框架（DNN）的预训练图像动作识别网络模型。为了将大脑活动应用于图像动作识别网络，我们训练回归模型来学习大脑活动与深层图像特征之间的关系。为了提高解码精度，我们通过在X3D模型中添加非局部注意力机制模块来捕捉长程时空依赖性，提出多任务损失约束的多层感知器（MLP）模块以构建更准确的回归映射方法，并通过线性插值进行数据增强以扩大数据量，从而减少小样本的影响。

结果与讨论

我们的研究结果表明，X3D-DNN中的特征具有生物学相关性，并捕获了对感知有用的信息。所提出的方法丰富了语义解码模型。我们还使用来自已知处理视觉刺激的不同脑区子集的数据进行了多项实验。结果表明，人类动作的语义信息广泛分布于整个视觉皮层。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f48/11880012/6094763efaa5/fninf-19-1526259-g001.jpg

相似文献

An action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activities.一种结合深度神经网络的动作解码框架，用于从诱发脑电活动预测视频中人类动作的语义。

Front Neuroinform. 2025 Feb 19;19:1526259. doi: 10.3389/fninf.2025.1526259. eCollection 2025.

Multi-Semantic Decoding of Visual Perception with Graph Neural Networks.基于图神经网络的视觉感知多语义解码。

Int J Neural Syst. 2024 Apr;34(4):2450016. doi: 10.1142/S0129065724500163. Epub 2024 Feb 17.

Improved image reconstruction from brain activity through automatic image captioning.通过自动图像字幕实现从大脑活动中改进图像重建。

Sci Rep. 2025 Feb 10;15(1):4907. doi: 10.1038/s41598-025-89242-3.

Retrieving and reconstructing conceptually similar images from fMRI with latent diffusion models and a neuro-inspired brain decoding model.使用潜在扩散模型和神经启发式脑解码模型从功能磁共振成像中检索和重建概念上相似的图像。

J Neural Eng. 2024 Jun 28;21(4). doi: 10.1088/1741-2552/ad593c.

Accurate Reconstruction of Image Stimuli From Human Functional Magnetic Resonance Imaging Based on the Decoding Model With Capsule Network Architecture.基于胶囊网络架构解码模型从人类功能磁共振成像中准确重建图像刺激

Front Neuroinform. 2018 Sep 20;12:62. doi: 10.3389/fninf.2018.00062. eCollection 2018.

'When' and 'what' did you see? A novel fMRI-based visual decoding framework.你看到了什么，何时看到的？一种新颖的基于 fMRI 的视觉解码框架。

J Neural Eng. 2020 Oct 13;17(5):056013. doi: 10.1088/1741-2552/abb691.

Brain tumor segmentation and detection in MRI using convolutional neural networks and VGG16.使用卷积神经网络和VGG16在磁共振成像（MRI）中进行脑肿瘤分割与检测

Cancer Biomark. 2025 Mar;42(3):18758592241311184. doi: 10.1177/18758592241311184. Epub 2025 Apr 4.

Transfer learning of deep neural network representations for fMRI decoding.基于深度神经网络表示的 fMRI 解码的迁移学习。

J Neurosci Methods. 2019 Dec 1;328:108319. doi: 10.1016/j.jneumeth.2019.108319. Epub 2019 Oct 1.

End-to-End Deep Image Reconstruction From Human Brain Activity.基于人类大脑活动的端到端深度图像重建

Front Comput Neurosci. 2019 Apr 12;13:21. doi: 10.3389/fncom.2019.00021. eCollection 2019.

Constraint-Free Natural Image Reconstruction From fMRI Signals Based on Convolutional Neural Network.基于卷积神经网络的功能磁共振成像信号无约束自然图像重建

Front Hum Neurosci. 2018 Jun 22;12:242. doi: 10.3389/fnhum.2018.00242. eCollection 2018.

引用本文的文献

Enhancing Adversarial Defense via Brain Activity Integration Without Adversarial Examples.通过脑活动整合增强对抗防御，无需对抗样本。

Sensors (Basel). 2025 Apr 25;25(9):2736. doi: 10.3390/s25092736.

本文引用的文献

Human-Centric Transformer for Domain Adaptive Action Recognition.用于域自适应动作识别的以人为中心的Transformer

IEEE Trans Pattern Anal Mach Intell. 2025 Feb;47(2):679-696. doi: 10.1109/TPAMI.2024.3429387. Epub 2025 Jan 9.

Modeling and augmenting of fMRI data using deep recurrent variational auto-encoder.使用深度递归变分自动编码器对 fMRI 数据进行建模和增强。

J Neural Eng. 2021 Jul 23;18(4). doi: 10.1088/1741-2552/ac1179.

Modeling EEG Data Distribution With a Wasserstein Generative Adversarial Network to Predict RSVP Events.使用 Wasserstein 生成对抗网络对 EEG 数据分布进行建模以预测 RSVP 事件。

IEEE Trans Neural Syst Rehabil Eng. 2020 Aug;28(8):1720-1730. doi: 10.1109/TNSRE.2020.3006180. Epub 2020 Jul 1.

Sociality and interaction envelope organize visual action representations.社会性和交互作用包络组织视觉动作表征。

Nat Commun. 2020 Jun 12;11(1):3002. doi: 10.1038/s41467-020-16846-w.

Reliability-based voxel selection.基于可靠性的体素选择。

Neuroimage. 2020 Feb 15;207:116350. doi: 10.1016/j.neuroimage.2019.116350. Epub 2019 Nov 14.

Distinct representations in occipito-temporal, parietal, and premotor cortex during action perception revealed by fMRI and computational modeling.功能磁共振成像和计算建模揭示了运动感知过程中枕颞叶、顶叶和运动前皮质中的不同表现。

Neuropsychologia. 2019 Apr;127:35-47. doi: 10.1016/j.neuropsychologia.2019.02.006. Epub 2019 Feb 14.

Front Neuroinform. 2018 Sep 20;12:62. doi: 10.3389/fninf.2018.00062. eCollection 2018.

Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision.基于深度学习的动态自然视觉神经编码与解码

Cereb Cortex. 2018 Dec 1;28(12):4136-4160. doi: 10.1093/cercor/bhx268.

Mapping between fMRI responses to movies and their natural language annotations.电影 fMRI 响应与其自然语言注释之间的映射。

Neuroimage. 2018 Oct 15;180(Pt A):223-231. doi: 10.1016/j.neuroimage.2017.06.042. Epub 2017 Jun 23.

Generic decoding of seen and imagined objects using hierarchical visual features.基于分层视觉特征的可见和想象物体的通用解码。

Nat Commun. 2017 May 22;8:15037. doi: 10.1038/ncomms15037.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种结合深度神经网络的动作解码框架，用于从诱发脑电活动预测视频中人类动作的语义。

An action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activities.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS AND DISCUSSION

引言

方法

结果与讨论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献