Pandey Ashutosh, Wang DeLiang
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 USA.
Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210 USA.
IEEE/ACM Trans Audio Speech Lang Process. 2023;31:1360-1370. doi: 10.1109/taslp.2023.3260711. Epub 2023 Mar 23.
Dealing with speech interference in a speech enhancement system requires either speaker separation or target speaker extraction. Speaker separation has multiple output streams with arbitrary assignments while target speaker extraction requires additional cueing for speaker selection. Both of these are not suitable for a standalone speech enhancement system with one output stream. In this study, we propose a novel training framework, called , to extend speech enhancement to deal with speech interruptions. Attentive training is based on the observation that, in the real world, multiple talkers very unlikely start speaking at the same time, and therefore, a deep neural network can be trained to create a representation of the first speaker and utilize it to attend to or track that speaker in a multitalker noisy mixture. We present experimental results and comparisons to demonstrate the effectiveness of attentive training for speech enhancement.
在语音增强系统中处理语音干扰需要进行说话人分离或目标说话人提取。说话人分离有多个输出流且分配任意,而目标说话人提取需要额外的提示来进行说话人选择。这两种方法都不适用于具有单个输出流的独立语音增强系统。在本研究中,我们提出了一种名为注意力训练的新型训练框架,以扩展语音增强来处理语音中断。注意力训练基于这样的观察:在现实世界中,多个说话者不太可能同时开始说话,因此,可以训练深度神经网络来创建第一个说话者的表示,并利用它在多说话者噪声混合中关注或跟踪该说话者。我们展示了实验结果和比较,以证明注意力训练对语音增强的有效性。