Yu Sigang, Shi Enze, Wang Ruoyang, Zhao Shijie, Liu Tianming, Jiang Xi, Zhang Shu
Center for Brain and Brain-Inspired Computing Research, Department of Computer Science, Northwestern Polytechnical University, Xi'an, China.
School of Automation, Northwestern Polytechnical University, Xi'an, China.
Front Hum Neurosci. 2022 Sep 30;16:944543. doi: 10.3389/fnhum.2022.944543. eCollection 2022.
Naturalistic stimuli, including movie, music, and speech, have been increasingly applied in the research of neuroimaging. Relative to a resting-state or single-task state, naturalistic stimuli can evoke more intense brain activities and have been proved to possess higher test-retest reliability, suggesting greater potential to study adaptive human brain function. In the current research, naturalistic functional magnetic resonance imaging (N-fMRI) has been a powerful tool to record brain states under naturalistic stimuli, and many efforts have been devoted to study the high-level semantic features from spatial or temporal representations N-fMRI. However, integrating both spatial and temporal characteristics of brain activities for better interpreting the patterns under naturalistic stimuli is still underexplored. In this work, a novel hybrid learning framework that comprehensively investigates both the spatial ( Predictive Model) and the temporal [ convolutional neural network (CNN) model] characteristics of the brain is proposed. Specifically, to focus on certain relevant regions from the whole brain, regions of significance (ROS), which contain common spatial activation characteristics across individuals, are selected the Predictive Model. Further, voxels of significance (VOS), whose signals contain significant temporal characteristics under naturalistic stimuli, are interpreted one-dimensional CNN (1D-CNN) model. In this article, our proposed framework is applied onto the N-fMRI data during naturalistic classical/pop/speech audios stimuli. The promising performance is achieved the Predictive Model to differentiate the different audio categories. Especially for distinguishing the classic and speech audios, the accuracy of classification is up to 92%. Moreover, spatial ROS and VOS are effectively obtained. Besides, temporal characteristics of the high-level semantic features are investigated on the frequency domain convolution kernels of 1D-CNN model, and we effectively bridge the "semantic gap" between high-level semantic features of N-fMRI and low-level acoustic features of naturalistic audios in the frequency domain. Our results provide novel insights on characterizing spatiotemporal patterns of brain activities N-fMRI and effectively explore the high-level semantic features under naturalistic stimuli, which will further benefit the understanding of the brain working mechanism and the advance of naturalistic stimuli clinical application.
包括电影、音乐和语音在内的自然主义刺激已越来越多地应用于神经成像研究。相对于静息状态或单任务状态,自然主义刺激能引发更强烈的大脑活动,并且已被证明具有更高的重测信度,这表明其在研究人类适应性脑功能方面具有更大潜力。在当前研究中,自然主义功能磁共振成像(N-fMRI)已成为记录自然主义刺激下大脑状态的有力工具,并且人们已投入诸多努力从N-fMRI的空间或时间表征中研究高级语义特征。然而,整合大脑活动的空间和时间特征以更好地解释自然主义刺激下的模式仍未得到充分探索。在这项工作中,提出了一种新颖的混合学习框架,该框架全面研究大脑的空间(预测模型)和时间[卷积神经网络(CNN)模型]特征。具体而言,为了从全脑聚焦于某些相关区域,预测模型选择了包含个体间共同空间激活特征的显著区域(ROS)。此外,一维CNN(1D-CNN)模型解释了其信号在自然主义刺激下包含显著时间特征的显著体素(VOS)。在本文中,我们提出的框架应用于自然主义经典/流行/语音音频刺激期间的N-fMRI数据。预测模型在区分不同音频类别方面取得了令人满意的性能。特别是在区分经典音频和语音音频时,分类准确率高达92%。此外,有效地获得了空间ROS和VOS。此外,通过1D-CNN模型的卷积核在频域上研究了高级语义特征的时间特征,并且我们在频域上有效地弥合了N-fMRI的高级语义特征与自然主义音频的低级声学特征之间的“语义鸿沟”。我们的结果为通过N-fMRI表征大脑活动的时空模式以及有效探索自然主义刺激下的高级语义特征提供了新的见解,这将进一步有助于理解大脑工作机制以及推动自然主义刺激的临床应用。