Departments of Biomedical Engineering and Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York 14627.
Departments of Biomedical Engineering and Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York 14627
J Neurosci. 2022 Oct 12;42(41):7782-7798. doi: 10.1523/JNEUROSCI.2735-20.2022. Epub 2022 Aug 30.
In recent years research on natural speech processing has benefited from recognizing that low-frequency cortical activity tracks the amplitude envelope of natural speech. However, it remains unclear to what extent this tracking reflects speech-specific processing beyond the analysis of the stimulus acoustics. In the present study, we aimed to disentangle contributions to cortical envelope tracking that reflect general acoustic processing from those that are functionally related to processing speech. To do so, we recorded EEG from subjects as they listened to auditory chimeras, stimuli composed of the temporal fine structure of one speech stimulus modulated by the amplitude envelope (ENV) of another speech stimulus. By varying the number of frequency bands used in making the chimeras, we obtained some control over which speech stimulus was recognized by the listener. No matter which stimulus was recognized, envelope tracking was always strongest for the ENV stimulus, indicating a dominant contribution from acoustic processing. However, there was also a positive relationship between intelligibility and the tracking of the perceived speech, indicating a contribution from speech-specific processing. These findings were supported by a follow-up analysis that assessed envelope tracking as a function of the (estimated) output of the cochlea rather than the original stimuli used in creating the chimeras. Finally, we sought to isolate the speech-specific contribution to envelope tracking using forward encoding models and found that indices of phonetic feature processing tracked reliably with intelligibility. Together these results show that cortical speech tracking is dominated by acoustic processing but also reflects speech-specific processing. Activity in auditory cortex is known to dynamically track the energy fluctuations, or amplitude envelope, of speech. Measures of this tracking are now widely used in research on hearing and language and have had a substantial influence on theories of how auditory cortex parses and processes speech. But how much of this speech tracking is actually driven by speech-specific processing rather than general acoustic processing is unclear, limiting its interpretability and its usefulness. Here, by merging two speech stimuli together to form so-called auditory chimeras, we show that EEG tracking of the speech envelope is dominated by acoustic processing but also reflects linguistic analysis. This has important implications for theories of cortical speech tracking and for using measures of that tracking in applied research.
近年来,自然语言处理的研究受益于这样一种认识,即低频皮质活动跟踪自然语言的幅度包络。然而,目前尚不清楚这种跟踪在多大程度上反映了除刺激声学分析之外的与语言处理相关的特异性处理。在本研究中,我们旨在将反映一般声学处理的皮质包络跟踪的贡献与与语言处理功能相关的贡献区分开来。为此,我们记录了被试在听听觉嵌合体时的 EEG,这些刺激由一个语音刺激的时间精细结构调制,调制方式为另一个语音刺激的幅度包络(ENV)。通过改变用于制作嵌合体的频带数量,我们可以在一定程度上控制听众识别哪个语音刺激。无论识别哪个刺激,ENV 刺激的包络跟踪总是最强的,这表明主要贡献来自声学处理。然而,感知语音的可理解性与跟踪之间也存在正相关关系,表明存在与语言特异性处理相关的贡献。这些发现得到了后续分析的支持,该分析评估了包络跟踪作为耳蜗(估计)输出的函数,而不是作为创建嵌合体时使用的原始刺激的函数。最后,我们试图使用前向编码模型来分离包络跟踪的语言特异性贡献,并发现语音特征处理的指标与可理解性可靠地相关。总的来说,这些结果表明皮质语言跟踪主要受声学处理驱动,但也反映了语言特异性处理。听觉皮层的活动已知会动态跟踪语音的能量波动或幅度包络。这种跟踪的测量现在广泛用于听力和语言研究,并对听觉皮层如何分析和处理语音的理论产生了重大影响。但是,不清楚这种语言跟踪中有多少实际上是由语言特异性处理而不是一般声学处理驱动的,这限制了其可解释性和在应用研究中的有用性。在这里,通过将两个语音刺激合并在一起形成所谓的听觉嵌合体,我们表明,EEG 对语音包络的跟踪主要受声学处理驱动,但也反映了语言分析。这对皮质语言跟踪的理论以及在应用研究中使用该跟踪的度量具有重要意义。