School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
Suresense Technology, Chongqing 400065, China.
Sensors (Basel). 2020 Sep 5;20(18):5050. doi: 10.3390/s20185050.
The quality and intelligibility of the speech are usually impaired by the interference of background noise when using internet voice calls. To solve this problem in the context of wearable smart devices, this paper introduces a dual-microphone, bone-conduction (BC) sensor assisted beamformer and a simple recurrent unit (SRU)-based neural network postfilter for real-time speech enhancement. Assisted by the BC sensor, which is insensitive to the environmental noise compared to the regular air-conduction (AC) microphone, the accurate voice activity detection (VAD) can be obtained from the BC signal and incorporated into the adaptive noise canceller (ANC) and adaptive block matrix (ABM). The SRU-based postfilter consists of a recurrent neural network with a small number of parameters, which improves the computational efficiency. The sub-band signal processing is designed to compress the input features of the neural network, and the scale-invariant signal-to-distortion ratio (SI-SDR) is developed as the loss function to minimize the distortion of the desired speech signal. Experimental results demonstrate that the proposed real-time speech enhancement system provides significant speech sound quality and intelligibility improvements for all noise types and levels when compared with the AC-only beamformer with a postfiltering algorithm.
当使用互联网语音通话时,背景噪声的干扰通常会降低语音的质量和可懂度。为了解决可穿戴智能设备环境中的这一问题,本文提出了一种基于双麦克风、骨传导(BC)传感器的波束形成器和基于简单循环单元(SRU)的神经网络后置滤波器的实时语音增强方法。与常规的空气传导(AC)麦克风相比,BC 传感器对环境噪声不敏感,因此可以通过 BC 信号获得准确的语音活动检测(VAD),并将其纳入自适应噪声消除器(ANC)和自适应块矩阵(ABM)中。基于 SRU 的后置滤波器由一个具有少量参数的循环神经网络组成,提高了计算效率。子带信号处理用于压缩神经网络的输入特征,并且开发了比例不变的信噪比失真比(SI-SDR)作为损失函数,以最小化期望语音信号的失真。实验结果表明,与具有后置滤波算法的仅 AC 波束形成器相比,所提出的实时语音增强系统在所有噪声类型和水平下都显著提高了语音质量和可懂度。