National Digital Switching System Engineering and Technological Research Center, Zhengzhou, 450000 China.
Center for Magnetic Resonance Imaging, Department of Neuroscience, University of Minnesota at Twin Cities, 55108 MN, USA.
J Neurosci Methods. 2019 Sep 1;325:108318. doi: 10.1016/j.jneumeth.2019.108318. Epub 2019 Jun 27.
Building visual encoding models to accurately predict visual responses is a central challenge for current vision-based brain-machine interface techniques. To achieve high prediction accuracy on neural signals, visual encoding models should include precise visual features and appropriate prediction algorithms. Most existing visual encoding models employ hand-craft visual features (e.g., Gabor wavelets or semantic labels) or data-driven features (e.g., features extracted from deep neural networks (DNN)). They also assume a linear mapping between feature representations to brain activity. However, it remains unknown whether such linear mapping is sufficient for maximizing prediction accuracy.
We construct a new visual encoding framework to predict cortical responses in a benchmark functional magnetic resonance imaging (fMRI) dataset. In this framework, we employ the transfer learning technique to incorporate a pre-trained DNN (i.e., AlexNet) and train a nonlinear mapping from visual features to brain activity. This nonlinear mapping replaces the conventional linear mapping and is supposed to improve prediction accuracy on measured activity in the human visual cortex.
The proposed framework can significantly predict responses of over 20% voxels in early visual areas (i.e., V1-lateral occipital region, LO) and achieve unprecedented prediction accuracy.
Comparing to two conventional visual encoding models, we find that the proposed encoding model shows consistent higher prediction accuracy in all early visual areas, especially in relatively anterior visual areas (i.e., V4 and LO).
Our work proposes a new framework to utilize pre-trained visual features and train non-linear mappings from visual features to brain activity.
构建能够准确预测视觉反应的视觉编码模型是当前基于视觉的脑机接口技术的核心挑战。为了在神经信号上实现高预测精度,视觉编码模型应包含精确的视觉特征和适当的预测算法。大多数现有的视觉编码模型采用手工制作的视觉特征(例如,Gabor 小波或语义标签)或数据驱动的特征(例如,从深度神经网络(DNN)中提取的特征)。它们还假设特征表示与大脑活动之间存在线性映射。然而,目前尚不清楚这种线性映射是否足以最大限度地提高预测精度。
我们构建了一个新的视觉编码框架,以预测基准功能磁共振成像(fMRI)数据集的皮层反应。在这个框架中,我们采用迁移学习技术,将预先训练好的 DNN(即 AlexNet)纳入其中,并训练从视觉特征到大脑活动的非线性映射。这个非线性映射取代了传统的线性映射,有望提高对人类视觉皮层中测量活动的预测精度。
所提出的框架可以显著预测早期视觉区域(即 V1-外侧枕叶区域,LO)超过 20%体素的反应,并实现前所未有的预测精度。
与两个传统的视觉编码模型相比,我们发现所提出的编码模型在所有早期视觉区域中均表现出一致的更高预测精度,尤其是在相对靠前的视觉区域(即 V4 和 LO)。
我们的工作提出了一种新的框架,用于利用预先训练的视觉特征并训练从视觉特征到大脑活动的非线性映射。