Wang Lei, Wu Ed X, Chen Fei
Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China.
Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, Hong Kong.
Front Hum Neurosci. 2020 Oct 7;14:557534. doi: 10.3389/fnhum.2020.557534. eCollection 2020.
The attended speech stream can be detected robustly, even in adverse auditory scenarios with auditory attentional modulation, and can be decoded using electroencephalographic (EEG) data. Speech segmentation based on the relative root-mean-square (RMS) intensity can be used to estimate segmental contributions to perception in noisy conditions. High-RMS-level segments contain crucial information for speech perception. Hence, this study aimed to investigate the effect of high-RMS-level speech segments on auditory attention decoding performance under various signal-to-noise ratio (SNR) conditions. Scalp EEG signals were recorded when subjects listened to the attended speech stream in the mixed speech narrated concurrently by two Mandarin speakers. The temporal response function was used to identify the attended speech from EEG responses of tracking to the temporal envelopes of intact speech and high-RMS-level speech segments alone, respectively. Auditory decoding performance was then analyzed under various SNR conditions by comparing EEG correlations to the attended and ignored speech streams. The accuracy of auditory attention decoding based on the temporal envelope with high-RMS-level speech segments was not inferior to that based on the temporal envelope of intact speech. Cortical activity correlated more strongly with attended than with ignored speech under different SNR conditions. These results suggest that EEG recordings corresponding to high-RMS-level speech segments carry crucial information for the identification and tracking of attended speech in the presence of background noise. This study also showed that with the modulation of auditory attention, attended speech can be decoded more robustly from neural activity than from behavioral measures under a wide range of SNR.
即使在存在听觉注意调制的不利听觉场景中,被关注的语音流也能被可靠地检测到,并且可以使用脑电图(EEG)数据进行解码。基于相对均方根(RMS)强度的语音分割可用于估计在噪声环境中各语音段对语音感知的贡献。高RMS水平的语音段包含语音感知的关键信息。因此,本研究旨在探讨在各种信噪比(SNR)条件下,高RMS水平语音段对听觉注意解码性能的影响。当受试者在两名说普通话的人同时叙述的混合语音中收听被关注的语音流时,记录头皮EEG信号。时间响应函数分别用于从仅跟踪完整语音和高RMS水平语音段的时间包络的EEG响应中识别被关注的语音。然后,通过比较EEG与被关注和被忽略语音流的相关性,分析在各种SNR条件下的听觉解码性能。基于高RMS水平语音段时间包络的听觉注意解码准确率不低于基于完整语音时间包络的解码准确率。在不同SNR条件下,皮层活动与被关注语音的相关性比与被忽略语音的相关性更强。这些结果表明,在存在背景噪声的情况下,与高RMS水平语音段对应的EEG记录携带了用于识别和跟踪被关注语音的关键信息。本研究还表明,在听觉注意的调制下,在广泛的SNR范围内,从神经活动中解码被关注语音比从行为测量中更稳健。