Wang Heming, Pandey Ashutosh, Wang DeLiang
The Ohio State University, 281 W Lane Ave, Columbus, 43210 OH, United States.
Center for Cognitive and Brain Science, 1835 Neil Ave, Columbus, 43210 OH, United States.
Comput Speech Lang. 2025 Jan;89. doi: 10.1016/j.csl.2024.101677. Epub 2024 Jun 6.
Deep learning has led to dramatic performance improvements for the task of speech enhancement, where deep neural networks (DNNs) are trained to recover clean speech from noisy and reverberant mixtures. Most of the existing DNN-based algorithms operate in the frequency domain, as time-domain approaches are believed to be less effective for speech dereverberation. In this study, we employ two DNNs: ARN (attentive recurrent network) and DC-CRN (densely-connected convolutional recurrent network), and systematically investigate the effects of different components on enhancement performance, such as window sizes, loss functions, and feature representations. We conduct evaluation experiments in two main conditions: reverberant-only and reverberant-noisy. Our findings suggest that incorporating larger window sizes is helpful for dereverberation, and adding transform operations (either convolutional or linear) to encode and decode waveform features improves the sparsity of the learned representations, and boosts the performance of time-domain models. Experimental results demonstrate that ARN and DC-CRN with proposed techniques achieve superior performance compared with other strong enhancement baselines.
深度学习已使语音增强任务的性能得到显著提升,在该任务中,深度神经网络(DNN)被训练用于从嘈杂和混响的混合语音中恢复纯净语音。现有的大多数基于DNN的算法在频域中运行,因为时域方法被认为在语音去混响方面效果较差。在本研究中,我们采用了两种DNN:ARN(注意力循环网络)和DC-CRN(密集连接卷积循环网络),并系统地研究了不同组件对增强性能的影响,如窗口大小、损失函数和特征表示。我们在两个主要条件下进行评估实验:仅混响和混响加噪声。我们的研究结果表明,采用更大的窗口大小有助于去混响,并且添加变换操作(卷积或线性)来编码和解码波形特征可提高学习表示的稀疏性,并提升时域模型的性能。实验结果表明,采用所提出技术的ARN和DC-CRN与其他强大的增强基线相比,具有更优的性能。