Nguyen Nhan Duc Thanh, Phan Huy, Geirnaert Simon, Mikkelsen Kaare, Kidmose Preben
IEEE Trans Neural Syst Rehabil Eng. 2025;33:2695-2706. doi: 10.1109/TNSRE.2025.3587637.
Auditory attention decoding (AAD) is the process of identifying the attended speech in a multi-talker environment using brain signals, typically recorded through electroencephalography (EEG). Over the past decade, AAD has undergone continuous development, driven by its promising application in neuro-steered hearing devices. Most AAD algorithms are relying on the increase in neural entrainment to the envelope of attended speech, as compared to unattended speech, typically using a two-step approach. First, the algorithm predicts representations of the attended speech signal envelopes; second, it identifies the attended speech by finding the highest correlation between the predictions and the representations of the actual speech signals. In this study, we proposed a novel end-to-end neural network architecture, named AADNet, which combines these two stages into a direct approach to address the AAD problem. We compare the proposed network against traditional stimulus decoding-based approaches, including linear stimulus reconstruction, canonical correlation analysis, and an alternative non-linear stimulus reconstruction using three different datasets. AADNet shows a significant performance improvement for both subject-specific and subject-independent models. Notably, the average subject-independent classification accuracies for different analysis window lengths range from 56.3% (1 s) to 78.1% (20 s), 57.5% (1 s) to 89.4% (40 s), and 56.0% (1 s) to 82.6% (40 s) for three validated datasets, respectively, showing a significantly improved ability to generalize to data from unseen subjects. These results highlight the potential of deep learning models for advancing AAD, with promising implications for future hearing aids, assistive devices, and clinical assessments.
听觉注意力解码(AAD)是指在多说话者环境中利用脑信号识别被关注语音的过程,脑信号通常通过脑电图(EEG)记录。在过去十年中,由于AAD在神经导向听力设备中的应用前景广阔,它得到了持续发展。与未被关注的语音相比,大多数AAD算法依赖于神经对被关注语音包络的同步增强,通常采用两步法。首先,算法预测被关注语音信号包络的表示;其次,通过找到预测与实际语音信号表示之间的最高相关性来识别被关注语音。在本研究中,我们提出了一种新颖的端到端神经网络架构,名为AADNet,它将这两个阶段合并为一种直接方法来解决AAD问题。我们将所提出的网络与基于传统刺激解码的方法进行比较,包括线性刺激重建、典型相关分析以及使用三个不同数据集的另一种非线性刺激重建。对于特定受试者模型和非特定受试者模型,AADNet都显示出显著的性能提升。值得注意的是,对于三个经过验证的数据集,不同分析窗口长度下非特定受试者的平均分类准确率分别为56.3%(1秒)至78.1%(20秒)、57.5%(1秒)至89.4%(40秒)以及56.0%(1秒)至82.6%(40秒),这表明其对来自未见过的受试者的数据具有显著提高的泛化能力。这些结果凸显了深度学习模型在推进AAD方面的潜力,对未来的助听器、辅助设备和临床评估具有重要意义。