MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK.
Trends Hear. 2024 Jan-Dec;28:23312165241266316. doi: 10.1177/23312165241266316.
During continuous speech perception, endogenous neural activity becomes time-locked to acoustic stimulus features, such as the speech amplitude envelope. This speech-brain coupling can be decoded using non-invasive brain imaging techniques, including electroencephalography (EEG). Neural decoding may provide clinical use as an objective measure of stimulus encoding by the brain-for example during cochlear implant listening, wherein the speech signal is severely spectrally degraded. Yet, interplay between acoustic and linguistic factors may lead to top-down modulation of perception, thereby complicating audiological applications. To address this ambiguity, we assess neural decoding of the speech envelope under spectral degradation with EEG in acoustically hearing listeners ( = 38; 18-35 years old) using vocoded speech. We dissociate sensory encoding from higher-order processing by employing intelligible (English) and non-intelligible (Dutch) stimuli, with auditory attention sustained using a repeated-phrase detection task. Subject-specific and group decoders were trained to reconstruct the speech envelope from held-out EEG data, with decoder significance determined via random permutation testing. Whereas speech envelope reconstruction did not vary by spectral resolution, intelligible speech was associated with better decoding accuracy in general. Results were similar across subject-specific and group analyses, with less consistent effects of spectral degradation in group decoding. Permutation tests revealed possible differences in decoder statistical significance by experimental condition. In general, while robust neural decoding was observed at the individual and group level, variability within participants would most likely prevent the clinical use of such a measure to differentiate levels of spectral degradation and intelligibility on an individual basis.
在连续的语音感知过程中,内源性神经活动与声学刺激特征(如语音幅度包络)时间锁定。这种语音-大脑耦合可以使用非侵入性脑成像技术(包括脑电图 (EEG))进行解码。神经解码可能具有临床应用价值,可作为大脑对刺激进行编码的客观测量指标,例如在人工耳蜗聆听期间,语音信号的频谱严重退化。然而,声学和语言因素之间的相互作用可能导致感知的自上而下调制,从而使听力学应用变得复杂。为了解决这种歧义,我们使用语音编码语音,通过 EEG 评估在声学听力听众( = 38;18-35 岁)中语音包络的光谱退化下的神经解码。我们通过使用重复短语检测任务维持听觉注意力,通过使用可理解的(英语)和不可理解的(荷兰语)刺激,从高阶处理中分离出感觉编码。使用保留的 EEG 数据对特定于主题和组的解码器进行训练,以从语音包络中重建,通过随机置换测试确定解码器的显著性。尽管语音包络重建与光谱分辨率无关,但可理解的语音通常与更高的解码准确性相关。特定于主题和组的分析结果相似,组解码中光谱退化的影响不太一致。置换测试显示实验条件下解码器统计显著性可能存在差异。总体而言,虽然在个体和组水平上都观察到了强大的神经解码,但参与者内部的变异性很可能会阻止这种测量指标在个体基础上区分光谱退化和可理解性水平的临床应用。