Department of Computing, Imperial College London, London SW7 2RH, United Kingdom.
Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2RH, United Kingdom.
J Neural Eng. 2022 Jul 6;19(4). doi: 10.1088/1741-2552/ac7976.
Smart hearing aids which can decode the focus of a user's attention could considerably improve comprehension levels in noisy environments. Methods for decoding auditory attention from electroencapholography (EEG) have attracted considerable interest for this reason. Recent studies suggest that the integration of deep neural networks (DNNs) into existing auditory attention decoding (AAD) algorithms is highly beneficial, although it remains unclear whether these enhanced algorithms can perform robustly in different real-world scenarios. Therefore, we sought to characterise the performance of DNNs at reconstructing the envelope of an attended speech stream from EEG recordings in different listening conditions. In addition, given the relatively sparse availability of EEG data, we investigate possibility of applying subject-independent algorithms to EEG recorded from unseen individuals.Both linear models and nonlinear DNNs were employed to decode the envelope of clean speech from EEG recordings, with and without subject-specific information. The mean behaviour, as well as the variability of the reconstruction, was characterised for each model. We then trained subject-specific linear models and DNNs to reconstruct the envelope of speech in clean and noisy conditions, and investigated how well they performed in different listening scenarios. We also established that these models can be used to decode auditory attention in competing-speaker scenarios.The DNNs offered a considerable advantage over their linear analogue at reconstructing the envelope of clean speech. This advantage persisted even when subject-specific information was unavailable at the time of training. The same DNN architectures generalised to a distinct dataset, which contained EEG recorded under a variety of listening conditions. In competing-speakers and speech-in-noise conditions, the DNNs significantly outperformed the linear models. Finally, the DNNs offered a considerable improvement over the linear approach at decoding auditory attention in competing-speakers scenarios.We present the first detailed study into the extent to which DNNs can be employed for reconstructing the envelope of an attended speech stream. We conclusively demonstrate that DNNs improve the reconstruction of the attended speech envelope. The variance of the reconstruction error is shown to be similar for both DNNs and the linear model. DNNs therefore show promise for real-world AAD, since they perform well in multiple listening conditions and generalise to data recorded from unseen participants.
能够解码用户注意力焦点的智能助听器可以极大地提高在嘈杂环境中的理解水平。出于这个原因,从脑电图(EEG)解码听觉注意力的方法引起了相当大的兴趣。最近的研究表明,将深度神经网络(DNN)集成到现有的听觉注意力解码(AAD)算法中非常有益,尽管尚不清楚这些增强算法是否能够在不同的真实场景中稳健地运行。因此,我们试图从不同的听力条件下的 EEG 记录中描述 DNN 重建被关注语音流包络的性能。此外,鉴于 EEG 数据相对稀疏,我们研究了将独立于个体的算法应用于从未见过的个体记录的 EEG 的可能性。
我们使用线性模型和非线性 DNN 从 EEG 记录中解码干净语音的包络,同时使用和不使用特定于个体的信息。为每个模型都描述了平均行为和重建的可变性。然后,我们训练特定于个体的线性模型和 DNN 以重建干净和嘈杂条件下的语音包络,并研究了它们在不同听力场景中的表现。我们还证明了这些模型可以用于在竞争说话者场景中解码听觉注意力。
与线性模拟相比,DNN 在重建干净语音的包络方面具有很大的优势。即使在训练时无法使用特定于个体的信息时,这种优势仍然存在。相同的 DNN 架构可以推广到包含各种听力条件下记录的 EEG 的不同数据集。在竞争说话者和语音噪声条件下,DNN 明显优于线性模型。最后,DNN 在解码竞争说话者场景中的听觉注意力方面比线性方法有了很大的改进。
我们首次详细研究了 DNN 在多大程度上可以用于重建被关注语音流的包络。我们明确证明 DNN 可以改善被关注语音包络的重建。重建误差的方差对于 DNN 和线性模型来说是相似的。因此,DNN 有望用于现实世界的 AAD,因为它们在多种听力条件下表现良好,并推广到从看不见的参与者记录的数据。