Laboratory of Integrative Neuroscience and Cognition, Department of Neuroscience, Georgetown University Medical Center, Washington, DC 20007, USA.
Proc Natl Acad Sci U S A. 2012 Feb 21;109(8):E505-14. doi: 10.1073/pnas.1113427109. Epub 2012 Feb 1.
Spoken word recognition requires complex, invariant representations. Using a meta-analytic approach incorporating more than 100 functional imaging experiments, we show that preference for complex sounds emerges in the human auditory ventral stream in a hierarchical fashion, consistent with nonhuman primate electrophysiology. Examining speech sounds, we show that activation associated with the processing of short-timescale patterns (i.e., phonemes) is consistently localized to left mid-superior temporal gyrus (STG), whereas activation associated with the integration of phonemes into temporally complex patterns (i.e., words) is consistently localized to left anterior STG. Further, we show left mid- to anterior STG is reliably implicated in the invariant representation of phonetic forms and that this area also responds preferentially to phonetic sounds, above artificial control sounds or environmental sounds. Together, this shows increasing encoding specificity and invariance along the auditory ventral stream for temporally complex speech sounds.
口语识别需要复杂且不变的表征。通过采用包含 100 多个功能成像实验的元分析方法,我们表明,复杂声音的偏好以层级方式出现在人类听觉腹侧流中,与非人类灵长类动物的电生理学结果一致。在研究语音时,我们发现与处理短时间尺度模式(即音素)相关的激活始终定位于左中颞上回(STG),而与将音素整合到时间复杂模式(即单词)相关的激活始终定位于左前 STG。此外,我们还发现左中到前 STG 可靠地参与了语音形式的不变表示,并且该区域也优先响应语音,而不是人工控制声音或环境声音。总的来说,这表明在时间复杂的语音中,听觉腹侧流的编码特异性和不变性逐渐增强。