Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.
Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.
PLoS Biol. 2022 Jul 28;20(7):e3001675. doi: 10.1371/journal.pbio.3001675. eCollection 2022 Jul.
The ability to recognize abstract features of voice during auditory perception is an intricate feat of human audition. For the listener, this occurs in near-automatic fashion to seamlessly extract complex cues from a highly variable auditory signal. Voice perception depends on specialized regions of auditory cortex, including superior temporal gyrus (STG) and superior temporal sulcus (STS). However, the nature of voice encoding at the cortical level remains poorly understood. We leverage intracerebral recordings across human auditory cortex during presentation of voice and nonvoice acoustic stimuli to examine voice encoding at the cortical level in 8 patient-participants undergoing epilepsy surgery evaluation. We show that voice selectivity increases along the auditory hierarchy from supratemporal plane (STP) to the STG and STS. Results show accurate decoding of vocalizations from human auditory cortical activity even in the complete absence of linguistic content. These findings show an early, less-selective temporal window of neural activity in the STG and STS followed by a sustained, strongly voice-selective window. Encoding models demonstrate divergence in the encoding of acoustic features along the auditory hierarchy, wherein STG/STS responses are best explained by voice category and acoustics, as opposed to acoustic features of voice stimuli alone. This is in contrast to neural activity recorded from STP, in which responses were accounted for by acoustic features. These findings support a model of voice perception that engages categorical encoding mechanisms within STG and STS to facilitate feature extraction.
在听觉感知过程中识别声音的抽象特征是人类听觉的一项复杂技能。对于听众来说,这种情况几乎是自动发生的,以便从高度可变的听觉信号中无缝提取复杂线索。语音感知依赖于听觉皮层的特定区域,包括颞上回(STG)和颞上沟(STS)。然而,皮质水平上的语音编码的性质仍知之甚少。我们利用 8 名接受癫痫手术评估的患者参与者在听觉皮层内进行的脑内记录,在呈现语音和非语音声刺激时检查皮质水平上的语音编码。我们表明,从颞平面(STP)到 STG 和 STS,语音选择性沿着听觉层次结构增加。结果表明,即使在完全没有语言内容的情况下,也可以从人类听觉皮层活动中准确解码发声。这些发现表明,在 STG 和 STS 中存在早期、选择性较低的神经活动时间窗口,随后是持续的、强烈的语音选择性窗口。编码模型表明,沿着听觉层次结构,声学特征的编码存在分歧,其中 STG/STS 的反应最好由语音类别和声学解释,而不是仅由语音刺激的声学特征解释。这与从 STP 记录的神经活动形成对比,在 STP 中,反应由声学特征来解释。这些发现支持了一种语音感知模型,该模型在 STG 和 STS 中采用了分类编码机制,以促进特征提取。