School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, People's Republic of China.
J Neural Eng. 2022 Oct 17;19(5). doi: 10.1088/1741-2552/ac975c.
. Auditory attention decoding (AAD) determines which speaker the listener is focusing on by analyzing his/her EEG. Convolutional neural network (CNN) was adopted to extract spectro-spatial-feature (SSF) from short-time-interval of EEG to detect auditory spatial attention without stimuli. However, the following factors are not considered in SSF-CNN scheme. (a) Single-band frequency analysis cannot represent the EEG pattern precisely. (b) The power cannot represent the EEG feature related to the dynamic patterns of the attended auditory stimulus. (c) The temporal feature of EEG representing the relationship between EEG and attended stimulus is not extracted. To solve these problems, SSF-CNN scheme was modified.. (a) Multiple-frequency bands, but not a single alpha frequency band, of EEG, were analyzed to represent the EEG pattern more precisely. (b) Differential entropy, but not power, was extracted from each frequency band to represent the disorder degree of EEG, which was related to the dynamic patterns of the attended auditory stimulus. (c) CNN and convolutional-long-short-term-memory (ConvLSTM) were combined to extract spectro-spatial-temporal features from the 3D descriptor sequence constructed based on the topographical activity maps of multiple-frequency bands.. Experimental results on KUL, DTU, and PKU with 0.1 s, 1 s, 2 s, and 5 s decision windows demonstrated that: (a) The proposed model outperformed SSF-CNN and state-of-the-art AAD models. Specifically, when the auditory stimulus was unavailable, AAD accuracy could be enhanced by at least3.25%,3.96%and5.08%on KUL, DTU, and PKU, respectively, compared with the baselines. And, on KUL, the longer decision window corresponded to lower enhancement, while on both DTU and PKU, the longer decision window corresponded to higher enhancement, except for two cases when decision window length was 2 s on PKU or 5 s on DTU. (b) Each modification contributed to the performance enhancement.. DE feature, multi-band frequency analysis, and ConvLSTM-based temporal analysis help to enhance AAD accuracy.
听觉注意力解码(AAD)通过分析 EEG 来确定听者正在关注哪个说话者。卷积神经网络(CNN)被采用来从 EEG 的短时间间隔中提取声谱空间特征(SSF),以在没有刺激的情况下检测听觉空间注意力。然而,SSF-CNN 方案没有考虑以下因素。(a) 单频带频率分析不能精确表示 EEG 模式。(b) 功率不能代表与被关注听觉刺激的动态模式相关的 EEG 特征。(c) 没有提取表示 EEG 与被关注刺激之间关系的 EEG 时间特征。为了解决这些问题,对 SSF-CNN 方案进行了修改。(a) 对 EEG 的多个频带而不是单个 alpha 频带进行分析,以更精确地表示 EEG 模式。(b) 从每个频带中提取微分熵而不是功率,以表示与被关注听觉刺激的动态模式相关的 EEG 无序程度。(c) 将 CNN 和卷积长短时记忆(ConvLSTM)结合起来,从基于多个频带拓扑活动图构建的 3D 描述符序列中提取声谱空间时间特征。在 KUL、DTU 和 PKU 上进行的 0.1s、1s、2s 和 5s 决策窗口的实验结果表明:(a) 所提出的模型优于 SSF-CNN 和最先进的 AAD 模型。具体来说,当听觉刺激不可用时,与基线相比,AAD 准确率可分别提高至少 3.25%、3.96%和 5.08%,分别在 KUL、DTU 和 PKU 上。并且,在 KUL 上,决策窗口越长,增强效果越低,而在 DTU 和 PKU 上,决策窗口越长,增强效果越高,但在 PKU 上决策窗口长度为 2s 或在 DTU 上决策窗口长度为 5s 时除外。(b) 每个修改都有助于提高性能。DE 特征、多频带频率分析和基于 ConvLSTM 的时间分析有助于提高 AAD 准确率。