Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Department of Psychology, University of Maryland, College Park, MD, 20742, USA.
Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Department of Psychology, University of Maryland, College Park, MD, 20742, USA.
Neuroimage. 2019 May 1;191:116-126. doi: 10.1016/j.neuroimage.2019.01.075. Epub 2019 Feb 5.
Human listeners can quickly and easily recognize different sound sources (objects and events) in their environment. Understanding how this impressive ability is accomplished can improve signal processing and machine intelligence applications along with assistive listening technologies. However, it is not clear how the brain represents the many sounds that humans can recognize (such as speech and music) at the level of individual sources, categories and acoustic features. To examine the cortical organization of these representations, we used patterns of fMRI responses to decode 1) four individual speakers and instruments from one another (separately, within each category), 2) the superordinate category labels associated with each stimulus (speech or instrument), and 3) a set of simple synthesized sounds that could be differentiated entirely on their acoustic features. Data were collected using an interleaved silent steady state sequence to increase the temporal signal-to-noise ratio, and mitigate issues with auditory stimulus presentation in fMRI. Largely separable clusters of voxels in the temporal lobes supported the decoding of individual speakers and instruments from other stimuli in the same category. Decoding the superordinate category of each sound was more accurate and involved a larger portion of the temporal lobes. However, these clusters all overlapped with areas that could decode simple, acoustically separable stimuli. Thus, individual sound sources from different sound categories are represented in separate regions of the temporal lobes that are situated within regions implicated in more general acoustic processes. These results bridge an important gap in our understanding of cortical representations of sounds and their acoustics.
人类听众可以快速而轻松地识别环境中的不同声源(物体和事件)。了解这种令人印象深刻的能力是如何实现的,可以改善信号处理和机器智能应用以及助听技术。然而,目前尚不清楚大脑如何在单个源、类别和声学特征的水平上表示人类可以识别的许多声音(如语音和音乐)。为了研究这些表示的皮质组织,我们使用 fMRI 反应模式来解码 1)四个个体说话者和乐器彼此之间的差异(在每个类别中分别),2)与每个刺激相关的上级类别标签(语音或乐器),以及 3)一组可以完全根据其声学特征区分的简单合成声音。数据是使用交错的静默稳态序列收集的,以提高时间信号与噪声比,并减轻 fMRI 中听觉刺激呈现的问题。颞叶中可分离的大量体素簇支持从同一类别中的其他刺激中解码单个说话者和乐器。对每个声音的上级类别的解码更准确,并且涉及颞叶的更大部分。然而,这些集群都与可以解码简单、声学上可分离刺激的区域重叠。因此,不同声音类别的单个声源在颞叶的不同区域中得到表示,这些区域位于更一般的声学过程所涉及的区域内。这些结果弥合了我们对声音及其声学皮质表示理解的重要差距。