在现实噪声环境中的说话人分离及其在认知控制助听器中的应用。

Artificial Intelligence Technology & Systems, MIT Lincoln Laboratory, Lexington, MA 02421, USA.

Neural Netw. 2021 Aug;140:136-147. doi: 10.1016/j.neunet.2021.02.020. Epub 2021 Mar 4.

Future wearable technology may provide for enhanced communication in noisy environments and for the ability to pick out a single talker of interest in a crowded room simply by the listener shifting their attentional focus. Such a system relies on two components, speaker separation and decoding the listener's attention to acoustic streams in the environment. To address the former, we present a system for joint speaker separation and noise suppression, referred to as the Binaural Enhancement via Attention Masking Network (BEAMNET). The BEAMNET system is an end-to-end neural network architecture based on self-attention. Binaural input waveforms are mapped to a joint embedding space via a learned encoder, and separate multiplicative masking mechanisms are included for noise suppression and speaker separation. Pairs of output binaural waveforms are then synthesized using learned decoders, each capturing a separated speaker while maintaining spatial cues. A key contribution of BEAMNET is that the architecture contains a separation path, an enhancement path, and an autoencoder path. This paper proposes a novel loss function which simultaneously trains these paths, so that disabling the masking mechanisms during inference causes BEAMNET to reconstruct the input speech signals. This allows dynamic control of the level of suppression applied by BEAMNET via a minimum gain level, which is not possible in other state-of-the-art approaches to end-to-end speaker separation. This paper also proposes a perceptually-motivated waveform distance measure. Using objective speech quality metrics, the proposed system is demonstrated to perform well at separating two equal-energy talkers, even in high levels of background noise. Subjective testing shows an improvement in speech intelligibility across a range of noise levels, for signals with artificially added head-related transfer functions and background noise. Finally, when used as part of an auditory attention decoder (AAD) system using existing electroencephalogram (EEG) data, BEAMNET is found to maintain the decoding accuracy achieved with ideal speaker separation, even in severe acoustic conditions. These results suggest that this enhancement system is highly effective at decoding auditory attention in realistic noise environments, and could possibly lead to improved speech perception in a cognitively controlled hearing aid.

未来的可穿戴技术可能会提供增强的在嘈杂环境中的通信能力，并能够通过听众将注意力集中在环境中的声学流上来简单地挑选出感兴趣的单个说话者。这样的系统依赖于两个组件，即说话者分离和解码听众对环境中声学流的注意力。为了解决前者，我们提出了一种用于联合说话者分离和噪声抑制的系统，称为双耳增强注意力掩蔽网络（BEAMNET）。BEAMNET 系统是一种基于自注意力的端到端神经网络架构。双耳输入波形通过学习的编码器映射到联合嵌入空间，并包含单独的乘法掩蔽机制，用于噪声抑制和说话者分离。然后使用学习的解码器合成输出的对双耳波形，每个解码器捕获一个分离的说话者，同时保持空间线索。BEAMNET 的一个关键贡献是，该架构包含一个分离路径、一个增强路径和一个自动编码器路径。本文提出了一种新的损失函数，该函数同时训练这些路径，以便在推理过程中禁用掩蔽机制会导致 BEAMNET 重建输入语音信号。这允许通过最小增益水平动态控制 BEAMNET 应用的抑制水平，这在其他端到端说话者分离的最新方法中是不可能的。本文还提出了一种基于感知的波形距离度量。使用客观的语音质量指标，所提出的系统在分离两个等能量说话者方面表现良好，即使在高背景噪声水平下也是如此。主观测试表明，在添加了人工头相关传递函数和背景噪声的信号中，在一系列噪声水平下，语音可懂度都有所提高。最后，当作为使用现有脑电图（EEG）数据的听觉注意力解码器（AAD）系统的一部分使用时，BEAMNET 被发现即使在严重的声学条件下，也能保持与理想说话者分离相同的解码精度。这些结果表明，这种增强系统在现实噪声环境中解码听觉注意力非常有效，并且可能会导致认知控制助听器中的语音感知得到改善。

相似文献

Speaker separation in realistic noise environments with applications to a cognitively-controlled hearing aid.

Neural Netw. 2021 Aug;140:136-147. doi: 10.1016/j.neunet.2021.02.020. Epub 2021 Mar 4.

EEG-based auditory attention detection: boundary conditions for background noise and speaker positions.

J Neural Eng. 2018 Dec;15(6):066017. doi: 10.1088/1741-2552/aae0a6. Epub 2018 Sep 12.

Neural decoding of attentional selection in multi-speaker environments without access to separated sources.

Annu Int Conf IEEE Eng Med Biol Soc. 2017 Jul;2017:1644-1647. doi: 10.1109/EMBC.2017.8037155.

Brain-Controlled Augmented Hearing for Spatially Moving Conversations in Multi-Talker Environments.

Adv Sci (Weinh). 2024 Nov;11(41):e2401379. doi: 10.1002/advs.202401379. Epub 2024 Sep 9.

Impact of Different Acoustic Components on EEG-Based Auditory Attention Decoding in Noisy and Reverberant Conditions.

IEEE Trans Neural Syst Rehabil Eng. 2019 Apr;27(4):652-663. doi: 10.1109/TNSRE.2019.2903404. Epub 2019 Mar 7.

Neural decoding of attentional selection in multi-speaker environments without access to clean sources.

J Neural Eng. 2017 Oct;14(5):056001. doi: 10.1088/1741-2552/aa7ab4. Epub 2017 Aug 4.

Effects of directional sound processing and listener's motivation on EEG responses to continuous noisy speech: Do normal-hearing and aided hearing-impaired listeners differ?

Hear Res. 2019 Jun;377:260-270. doi: 10.1016/j.heares.2019.04.005. Epub 2019 Apr 11.

EEG-Informed Attended Speaker Extraction From Recorded Speech Mixtures With Application in Neuro-Steered Hearing Prostheses.

IEEE Trans Biomed Eng. 2017 May;64(5):1045-1056. doi: 10.1109/TBME.2016.2587382. Epub 2016 Jul 7.

Role of Binaural Temporal Fine Structure and Envelope Cues in Cocktail-Party Listening.

J Neurosci. 2016 Aug 3;36(31):8250-7. doi: 10.1523/JNEUROSCI.4421-15.2016.

Congruent audiovisual speech enhances auditory attention decoding with EEG.

J Neural Eng. 2019 Nov 6;16(6):066033. doi: 10.1088/1741-2552/ab4340.

引用本文的文献

A Brain-Computer Interface for Improving Auditory Attention in Multi-Talker Environments.

bioRxiv. 2025 Mar 13:2025.03.13.641661. doi: 10.1101/2025.03.13.641661.

Brain-Controlled Augmented Hearing for Spatially Moving Conversations in Multi-Talker Environments.

Adv Sci (Weinh). 2024 Nov;11(41):e2401379. doi: 10.1002/advs.202401379. Epub 2024 Sep 9.

EEG alpha and pupil diameter reflect endogenous auditory attention switching and listening effort.

Eur J Neurosci. 2022 Mar;55(5):1262-1277. doi: 10.1111/ejn.15616. Epub 2022 Feb 16.

A State-of-Art Review of Digital Technologies for the Next Generation of Tinnitus Therapeutics.

Front Digit Health. 2021 Aug 10;3:724370. doi: 10.3389/fdgth.2021.724370. eCollection 2021.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Speaker separation in realistic noise environments with applications to a cognitively-controlled hearing aid.

Neural Netw. 2021 Aug;140:136-147. doi: 10.1016/j.neunet.2021.02.020. Epub 2021 Mar 4.

EEG-based auditory attention detection: boundary conditions for background noise and speaker positions.

J Neural Eng. 2018 Dec;15(6):066017. doi: 10.1088/1741-2552/aae0a6. Epub 2018 Sep 12.

Neural decoding of attentional selection in multi-speaker environments without access to separated sources.

Annu Int Conf IEEE Eng Med Biol Soc. 2017 Jul;2017:1644-1647. doi: 10.1109/EMBC.2017.8037155.

Brain-Controlled Augmented Hearing for Spatially Moving Conversations in Multi-Talker Environments.

Adv Sci (Weinh). 2024 Nov;11(41):e2401379. doi: 10.1002/advs.202401379. Epub 2024 Sep 9.

Impact of Different Acoustic Components on EEG-Based Auditory Attention Decoding in Noisy and Reverberant Conditions.

IEEE Trans Neural Syst Rehabil Eng. 2019 Apr;27(4):652-663. doi: 10.1109/TNSRE.2019.2903404. Epub 2019 Mar 7.

Neural decoding of attentional selection in multi-speaker environments without access to clean sources.

J Neural Eng. 2017 Oct;14(5):056001. doi: 10.1088/1741-2552/aa7ab4. Epub 2017 Aug 4.

Effects of directional sound processing and listener's motivation on EEG responses to continuous noisy speech: Do normal-hearing and aided hearing-impaired listeners differ?

Hear Res. 2019 Jun;377:260-270. doi: 10.1016/j.heares.2019.04.005. Epub 2019 Apr 11.

EEG-Informed Attended Speaker Extraction From Recorded Speech Mixtures With Application in Neuro-Steered Hearing Prostheses.

IEEE Trans Biomed Eng. 2017 May;64(5):1045-1056. doi: 10.1109/TBME.2016.2587382. Epub 2016 Jul 7.

Role of Binaural Temporal Fine Structure and Envelope Cues in Cocktail-Party Listening.

J Neurosci. 2016 Aug 3;36(31):8250-7. doi: 10.1523/JNEUROSCI.4421-15.2016.

Congruent audiovisual speech enhances auditory attention decoding with EEG.

J Neural Eng. 2019 Nov 6;16(6):066033. doi: 10.1088/1741-2552/ab4340.

引用本文的文献

A Brain-Computer Interface for Improving Auditory Attention in Multi-Talker Environments.

bioRxiv. 2025 Mar 13:2025.03.13.641661. doi: 10.1101/2025.03.13.641661.

Brain-Controlled Augmented Hearing for Spatially Moving Conversations in Multi-Talker Environments.

Adv Sci (Weinh). 2024 Nov;11(41):e2401379. doi: 10.1002/advs.202401379. Epub 2024 Sep 9.

EEG alpha and pupil diameter reflect endogenous auditory attention switching and listening effort.

Eur J Neurosci. 2022 Mar;55(5):1262-1277. doi: 10.1111/ejn.15616. Epub 2022 Feb 16.

A State-of-Art Review of Digital Technologies for the Next Generation of Tinnitus Therapeutics.

Front Digit Health. 2021 Aug 10;3:724370. doi: 10.3389/fdgth.2021.724370. eCollection 2021.

Speaker separation in realistic noise environments with applications to a cognitively-controlled hearing aid.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献