Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China.
University of Chinese Academy of Sciences, Beijing 100049, China.
J Acoust Soc Am. 2024 Aug 1;156(2):1355-1366. doi: 10.1121/10.0028339.
Air-conducted (AC) microphones capture the high-quality desired speech and ambient noise, whereas bone-conducted (BC) microphones are immune to ambient noise but only capture band limited speech. This paper proposes a speech enhancement model that leverages the merits of BC and AC speech. The proposed model takes the spectrogram of BC and AC speech as input and fuses them by an attention-based feature fusion module. The backbone network of the proposed model uses the fused signals to estimate mask of the target speech, which is then applied to the noisy AC speech to recover the target speech. The proposed model adopts a lightweight design of densely gated convolutional attention network (DenGCAN) as the backbone network, which contains encoder, bottleneck layers, and decoder. Furthermore, this paper improves an attention gate and integrates it into skip-connections of DenGCAN, which allows the decoder to focus on the key areas of the feature map extracted by the encoder. As the DenGCAN adopts self-attention mechanism, the proposed model has the potential to improve noise reduction performance at the expense of an increased input-output latency. Experimental results demonstrate that the enhanced speech of the proposed model achieves an average 1.870 wideband-PESQ improvement over the noisy AC speech.
空气传导(AC)麦克风可以捕捉高质量的期望语音和环境噪声,而骨传导(BC)麦克风则不受环境噪声影响,但只能捕捉带宽有限的语音。本文提出了一种利用 BC 和 AC 语音优点的语音增强模型。该模型以 BC 和 AC 语音的频谱图作为输入,并通过基于注意力的特征融合模块对它们进行融合。该模型的骨干网络使用融合信号来估计目标语音的掩蔽,然后将其应用于噪声 AC 语音以恢复目标语音。该模型采用轻量级的密集门控卷积注意网络(DenGCAN)作为骨干网络,包含编码器、瓶颈层和解码器。此外,本文改进了一个注意力门,并将其集成到 DenGCAN 的跳过连接中,使解码器能够专注于编码器提取的特征图的关键区域。由于 DenGCAN 采用了自注意力机制,因此该模型有可能在增加输入-输出延迟的情况下提高降噪性能。实验结果表明,与噪声 AC 语音相比,所提出模型增强后的语音在宽带 PESQ 上平均提高了 1.870 分。