Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072 China; Tianjin Key Laboratory of Brain Science and Neuroengineering, Tianjin 300072 China.
Medizinische Physik, Carl von Ossietzky Universität Oldenburg and Cluster of Excellence "Hearing4all", Küpkersweg 74, 26129, Oldenburg, Germany.
Methods. 2022 Aug;204:410-417. doi: 10.1016/j.ymeth.2022.04.009. Epub 2022 Apr 18.
The human auditory system extracts valid information in noisy environments while ignoring other distractions, relying primarily on auditory attention. Studies have shown that the cerebral cortex responds differently to the sound source locations and that auditory attention is time-varying. In this work, we proposed a data-driven encoder-decoder architecture model for auditory attention detection (AAD), denoted as AAD-transformer. The model contains temporal self-attention and channel attention modules and could reconstruct the speech envelope by dynamically assigning weights according to the temporal self-attention and channel attention mechanisms of electroencephalogram (EEG). In addition, the model is conducted based on data-driven without additional preprocessing steps. The proposed model was validated using a binaural listening dataset, in which the speech stimulus was Mandarin, and compared with other models. The results showed that the decoding accuracy of the AAD-transformer in the 0.15-second decoding time window was 76.35%, which was much higher than the accuracy of the linear model using temporal response function in the 3-second decoding time window (increased by 16.27%). This work provides a novel auditory attention detection method, and the data-driven characteristic makes it convenient for neural-steered hearing devices, especially those who speak tonal languages.
人类听觉系统在嘈杂环境中提取有效信息,同时忽略其他干扰,主要依赖听觉注意力。研究表明,大脑皮层对声源位置的反应不同,听觉注意力是时变的。在这项工作中,我们提出了一种用于听觉注意力检测(AAD)的数据驱动编码器-解码器架构模型,称为 AAD-transformer。该模型包含时间自注意力和通道注意力模块,可以通过根据脑电图(EEG)的时间自注意力和通道注意力机制动态分配权重来重建语音包络。此外,该模型基于数据驱动,无需额外的预处理步骤。该模型使用双耳听力数据集进行验证,其中语音刺激是普通话,并与其他模型进行了比较。结果表明,在 0.15 秒的解码时间窗口中,AAD-transformer 的解码准确率为 76.35%,比使用 3 秒解码时间窗口的时变响应函数的线性模型准确率(提高了 16.27%)高得多。这项工作提供了一种新的听觉注意力检测方法,数据驱动的特点使其方便用于神经导向的听力设备,特别是那些说声调语言的人。