Fuglsang Søren Asp, Dau Torsten, Hjortkjær Jens
Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, 2800 Kgs. Lyngby, Denmark.
Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, 2800 Kgs. Lyngby, Denmark.
Neuroimage. 2017 Aug 1;156:435-444. doi: 10.1016/j.neuroimage.2017.04.026. Epub 2017 Apr 13.
Selectively attending to one speaker in a multi-speaker scenario is thought to synchronize low-frequency cortical activity to the attended speech signal. In recent studies, reconstruction of speech from single-trial electroencephalogram (EEG) data has been used to decode which talker a listener is attending to in a two-talker situation. It is currently unclear how this generalizes to more complex sound environments. Behaviorally, speech perception is robust to the acoustic distortions that listeners typically encounter in everyday life, but it is unknown whether this is mirrored by a noise-robust neural tracking of attended speech. Here we used advanced acoustic simulations to recreate real-world acoustic scenes in the laboratory. In virtual acoustic realities with varying amounts of reverberation and number of interfering talkers, listeners selectively attended to the speech stream of a particular talker. Across the different listening environments, we found that the attended talker could be accurately decoded from single-trial EEG data irrespective of the different distortions in the acoustic input. For highly reverberant environments, speech envelopes reconstructed from neural responses to the distorted stimuli resembled the original clean signal more than the distorted input. With reverberant speech, we observed a late cortical response to the attended speech stream that encoded temporal modulations in the speech signal without its reverberant distortion. Single-trial attention decoding accuracies based on 40-50s long blocks of data from 64 scalp electrodes were equally high (80-90% correct) in all considered listening environments and remained statistically significant using down to 10 scalp electrodes and short (<30-s) unaveraged EEG segments. In contrast to the robust decoding of the attended talker we found that decoding of the unattended talker deteriorated with the acoustic distortions. These results suggest that cortical activity tracks an attended speech signal in a way that is invariant to acoustic distortions encountered in real-life sound environments. Noise-robust attention decoding additionally suggests a potential utility of stimulus reconstruction techniques in attention-controlled brain-computer interfaces.
在多说话者场景中选择性地关注一个说话者被认为会使低频皮层活动与被关注的语音信号同步。在最近的研究中,利用单次试验脑电图(EEG)数据进行语音重构已被用于解码在双说话者情境中听众正在关注哪个说话者。目前尚不清楚这如何推广到更复杂的声音环境。从行为上来说,语音感知对听众在日常生活中通常遇到的声学失真具有鲁棒性,但尚不清楚这是否通过对被关注语音的噪声鲁棒神经跟踪得到反映。在这里,我们使用先进的声学模拟在实验室中重现真实世界的声学场景。在具有不同混响量和干扰说话者数量的虚拟声学环境中,听众选择性地关注特定说话者的语音流。在不同的聆听环境中,我们发现无论声学输入中的不同失真情况如何,都可以从单次试验EEG数据中准确解码出被关注的说话者。对于高混响环境,从对失真刺激的神经反应重构的语音包络比失真输入更类似于原始纯净信号。对于混响语音,我们观察到对被关注语音流的晚期皮层反应,该反应对语音信号中的时间调制进行编码而不包含其混响失真。基于来自64个头皮电极的40 - 50秒长数据块的单次试验注意力解码准确率在所有考虑的聆听环境中同样很高(正确率为80 - 90%),并且使用低至10个头皮电极和短(<30秒)未平均的EEG片段时仍具有统计学意义。与对被关注说话者的鲁棒解码形成对比的是,我们发现对未被关注说话者的解码随着声学失真而恶化。这些结果表明,皮层活动以一种对现实生活声音环境中遇到的声学失真不变的方式跟踪被关注的语音信号。噪声鲁棒注意力解码还表明刺激重构技术在注意力控制脑机接口中的潜在效用。