IEEE Trans Neural Syst Rehabil Eng. 2019 Apr;27(4):652-663. doi: 10.1109/TNSRE.2019.2903404. Epub 2019 Mar 7.
Identifying the target speaker in hearing aid applications is an essential ingredient to improve speech intelligibility. Recently, a least-squares-based method has been proposed to identify the attended speaker from single-trial EEG recordings for an acoustic scenario with two competing speakers. This least-squares-based auditory attention decoding (AAD) method aims at decoding auditory attention by reconstructing the attended speech envelope from the EEG recordings using a trained spatio-temporal filter. While the performance of this AAD method has been mainly studied for noiseless and anechoic acoustic conditions, it is important to fully understand its performance in realistic noisy and reverberant acoustic conditions. In this paper, we investigate AAD using EEG recordings for different acoustic conditions (anechoic, reverberant, noisy, and reverberant-noisy). In particular, we investigate the impact of different acoustic conditions for AAD filter training and for decoding. In addition, we investigate the influence on the decoding performance of the different acoustic components (i.e., reverberation, background noise, and interfering speaker) in the reference signals used for decoding and the training signals used for computing the filters. First, we found that for all considered acoustic conditions it is possible to decode auditory attention with a considerably large decoding performance. In particular, even when the acoustic conditions for AAD filter training and for decoding are different, the decoding performance is still comparably large. Second, when using speech signals affected by either reverberation and/or background noise there is no significant difference in decoding performance ( ) compared to when using clean speech signals as reference signals. In contrast, when using reference signals affected by the interfering speaker, the decoding performance significantly decreases. Third, the experimental results indicate that it is even feasible to use training signals affected by reverberation, background noise and/or the interfering speaker for computing the filters.
在助听器应用中识别目标说话人是提高语音可懂度的重要组成部分。最近,提出了一种基于最小二乘的方法,用于从具有两个竞争说话人的声场景的单试 EEG 记录中识别关注的说话人。这种基于最小二乘的听觉注意力解码(AAD)方法旨在通过使用训练的时空滤波器从 EEG 记录中重建关注的语音包络来解码听觉注意力。虽然该 AAD 方法的性能主要在无噪声和消声声学条件下进行了研究,但在实际的噪声和混响声学条件下充分了解其性能非常重要。在本文中,我们使用 EEG 记录针对不同的声学条件(消声、混响、噪声和混响噪声)研究 AAD。特别是,我们研究了不同的声学条件对 AAD 滤波器训练和解码的影响。此外,我们还研究了在用于解码和计算滤波器的训练信号中参考信号中不同声学分量(即混响、背景噪声和干扰说话人)对解码性能的影响。首先,我们发现,对于所有考虑的声学条件,都可以使用相当大的解码性能来解码听觉注意力。特别是,即使 AAD 滤波器训练和解码的声学条件不同,解码性能仍然相当大。其次,当使用受混响和/或背景噪声影响的语音信号作为参考信号时,与使用清洁语音信号作为参考信号相比,解码性能没有显著差异()。相比之下,当使用受干扰说话人影响的参考信号时,解码性能会显著下降。第三,实验结果表明,即使使用受混响、背景噪声和/或干扰说话人影响的训练信号来计算滤波器,也是可行的。