Wang Lei, Wang Yihan, Liu Zhixing, Wu Ed X, Chen Fei
Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China.
Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China.
Front Neurosci. 2022 Feb 10;15:760611. doi: 10.3389/fnins.2021.760611. eCollection 2021.
In the competing speaker environments, human listeners need to focus or switch their auditory attention according to dynamic intentions. The reliable cortical tracking ability to the speech envelope is an effective feature for decoding the target speech from the neural signals. Moreover, previous studies revealed that the root mean square (RMS)-level-based speech segmentation made a great contribution to the target speech perception with the modulation of sustained auditory attention. This study further investigated the effect of the RMS-level-based speech segmentation on the auditory attention decoding (AAD) performance with both sustained and switched attention in the competing speaker auditory scenes. Objective biomarkers derived from the cortical activities were also developed to index the dynamic auditory attention states. In the current study, subjects were asked to concentrate or switch their attention between two competing speaker streams. The neural responses to the higher- and lower-RMS-level speech segments were analyzed the linear temporal response function (TRF) before and after the attention switching from one to the other speaker stream. Furthermore, the AAD performance decoded by the unified TRF decoding model was compared to that by the speech-RMS-level-based segmented decoding model with the dynamic change of the auditory attention states. The results showed that the weight of the typical TRF component approximately 100-ms time lag was sensitive to the switching of the auditory attention. Compared to the unified AAD model, the segmented AAD model improved attention decoding performance under both the sustained and switched auditory attention modulations in a wide range of signal-to-masker ratios (SMRs). In the competing speaker scenes, the TRF weight and AAD accuracy could be used as effective indicators to detect the changes of the auditory attention. In addition, with a wide range of SMRs (i.e., from 6 to -6 dB in this study), the segmented AAD model showed the robust decoding performance even with short decision window length, suggesting that this speech-RMS-level-based model has the potential to decode dynamic attention states in the realistic auditory scenarios.
在竞争性说话者环境中,人类听众需要根据动态意图集中或切换他们的听觉注意力。对语音包络的可靠皮层跟踪能力是从神经信号中解码目标语音的有效特征。此外,先前的研究表明,基于均方根(RMS)电平的语音分割在持续听觉注意力的调制下对目标语音感知有很大贡献。本研究进一步探讨了基于RMS电平的语音分割在竞争性说话者听觉场景中对持续和切换注意力的听觉注意力解码(AAD)性能的影响。还开发了源自皮层活动的客观生物标志物来索引动态听觉注意力状态。在当前研究中,要求受试者在两个竞争性说话者流之间集中或切换注意力。分析了从一个说话者流切换到另一个说话者流之前和之后,对高RMS电平和低RMS电平语音段的神经反应的线性时间响应函数(TRF)。此外,将统一TRF解码模型解码的AAD性能与基于语音RMS电平的分段解码模型在听觉注意力状态动态变化时的性能进行了比较。结果表明,典型TRF分量在大约100毫秒时间滞后的权重对听觉注意力的切换很敏感。与统一AAD模型相比,分段AAD模型在广泛的信号掩蔽比(SMR)下,在持续和切换听觉注意力调制下均提高了注意力解码性能。在竞争性说话者场景中,TRF权重和AAD准确性可作为检测听觉注意力变化的有效指标。此外,在广泛的SMR范围内(即本研究中从6到 -6 dB),分段AAD模型即使在决策窗口长度较短时也表现出强大的解码性能,这表明这种基于语音RMS电平的模型有潜力在现实听觉场景中解码动态注意力状态。