Department of Brain and Cognitive Engineering, Korea University, Anam-ro 145, Seongbuk-gu, Seoul, 02841, Republic of Korea.
Section on Functional Imaging Methods, Lab of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, 20892, USA.
Neuroimage. 2019 Feb 1;186:607-627. doi: 10.1016/j.neuroimage.2018.10.054. Epub 2018 Oct 23.
An artificial neural network with multiple hidden layers (known as a deep neural network, or DNN) was employed as a predictive model (DNN) for the first time to predict emotional responses using whole-brain functional magnetic resonance imaging (fMRI) data from individual subjects. During fMRI data acquisition, 10 healthy participants listened to 80 International Affective Digital Sound stimuli and rated their own emotions generated by each sound stimulus in terms of the arousal, dominance, and valence dimensions. The whole-brain spatial patterns from a general linear model (i.e., beta-valued maps) for each sound stimulus and the emotional response ratings were used as the input and output for the DNN, respectively. Based on a nested five-fold cross-validation scheme, the paired input and output data were divided into training (three-fold), validation (one-fold), and test (one-fold) data. The DNN was trained and optimized using the training and validation data and was tested using the test data. The Pearson's correlation coefficients between the rated and predicted emotional responses from our DNN model with weight sparsity optimization (mean ± standard error 0.52 ± 0.02 for arousal, 0.51 ± 0.03 for dominance, and 0.51 ± 0.03 for valence, with an input denoising level of 0.3 and a mini-batch size of 1) were significantly greater than those of DNN models with conventional regularization schemes including elastic net regularization (0.15 ± 0.05, 0.15 ± 0.06, and 0.21 ± 0.04 for arousal, dominance, and valence, respectively), those of shallow models including logistic regression (0.11 ± 0.04, 0.10 ± 0.05, and 0.17 ± 0.04 for arousal, dominance, and valence, respectively; average of logistic regression and sparse logistic regression), and those of support vector machine-based predictive models (SVMs; 0.12 ± 0.06, 0.06 ± 0.06, and 0.10 ± 0.06 for arousal, dominance, and valence, respectively; average of linear and non-linear SVMs). This difference was confirmed to be significant with a Bonferroni-corrected p-value of less than 0.001 from a one-way analysis of variance (ANOVA) and subsequent paired t-test. The weights of the trained DNNs were interpreted and input patterns that maximized or minimized the output of the DNNs (i.e., the emotional responses) were estimated. Based on a binary classification of each emotion category (e.g., high arousal vs. low arousal), the error rates for the DNN (31.2% ± 1.3% for arousal, 29.0% ± 1.7% for dominance, and 28.6% ± 3.0% for valence) were significantly lower than those for the linear SVM (44.7% ± 2.0%, 50.7% ± 1.7%, and 47.4% ± 1.9% for arousal, dominance, and valence, respectively) and the non-linear SVM (48.8% ± 2.3%, 52.2% ± 1.9%, and 46.4% ± 1.3% for arousal, dominance, and valence, respectively), as confirmed by the Bonferroni-corrected p < 0.001 from the one-way ANOVA. Our study demonstrates that the DNN model is able to reveal neuronal circuitry associated with human emotional processing - including structures in the limbic and paralimbic areas, which include the amygdala, prefrontal areas, anterior cingulate cortex, insula, and caudate. Our DNN model was also able to use activation patterns in these structures to predict and classify emotional responses to stimuli.
采用具有多个隐藏层的人工神经网络(称为深度神经网络,或 DNN)作为预测模型(DNN),首次使用个体受试者的全脑功能磁共振成像(fMRI)数据来预测情绪反应。在 fMRI 数据采集过程中,10 名健康参与者聆听了 80 个国际情感数字声音刺激,并根据唤醒度、主导度和效价维度对每个声音刺激引起的自身情绪进行评级。每个声音刺激的一般线性模型(即β值映射)的全脑空间模式和情绪反应评级分别作为 DNN 的输入和输出。基于嵌套的五折交叉验证方案,将配对的输入和输出数据分为训练(三折)、验证(一折)和测试(一折)数据。使用训练和验证数据对 DNN 进行训练和优化,并使用测试数据进行测试。经过权重稀疏优化的 DNN 模型(唤醒度的平均 Pearson 相关系数为 0.52 ± 0.02,主导度为 0.51 ± 0.03,效价为 0.51 ± 0.03,输入去噪水平为 0.3,小批量大小为 1)与具有传统正则化方案的 DNN 模型(包括弹性网正则化,唤醒度为 0.15 ± 0.05,主导度为 0.15 ± 0.06,效价为 0.21 ± 0.04;浅层模型,包括逻辑回归,唤醒度为 0.11 ± 0.04,主导度为 0.10 ± 0.05,效价为 0.17 ± 0.04;基于支持向量机的预测模型(SVMs),唤醒度为 0.12 ± 0.06,主导度为 0.06 ± 0.06,效价为 0.10 ± 0.06;线性和非线性 SVMs 的平均值)之间的相关性显著更高,差异具有统计学意义(方差分析和后续配对 t 检验的 Bonferroni 校正 p 值均小于 0.001)。对训练后的 DNN 进行权重解释,并估计最大化或最小化 DNN 输出(即情绪反应)的输入模式。基于每个情绪类别的二分类(例如,高唤醒度与低唤醒度),DNN 的错误率(唤醒度为 31.2% ± 1.3%,主导度为 29.0% ± 1.7%,效价为 28.6% ± 3.0%)显著低于线性 SVM(唤醒度为 44.7% ± 2.0%,主导度为 50.7% ± 1.7%,效价为 47.4% ± 1.9%)和非线性 SVM(唤醒度为 48.8% ± 2.3%,主导度为 52.2% ± 1.9%,效价为 46.4% ± 1.3%),Bonferroni 校正后的 p 值均小于 0.001(方差分析)。我们的研究表明,DNN 模型能够揭示与人类情绪处理相关的神经回路 - 包括边缘和副边缘区域的结构,包括杏仁核、前额叶区域、前扣带皮层、脑岛和尾状核。我们的 DNN 模型还能够使用这些结构中的激活模式来预测和分类对刺激的情绪反应。