Suppr超能文献

基于卷积神经网络(CNN)的多通道语音增强系统的降噪方法,采用离散小波变换(DWT)预处理。

CNN-based noise reduction for multi-channel speech enhancement system with discrete wavelet transform (DWT) preprocessing.

作者信息

Cherukuru Pavani, Mustafa Mumtaz Begum

机构信息

Department of Software Engineering, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia.

Department of Information Science, Dayananda Sagar Academy of Technology and Management, Bangalore, Karnataka, India.

出版信息

PeerJ Comput Sci. 2024 Feb 28;10:e1901. doi: 10.7717/peerj-cs.1901. eCollection 2024.

Abstract

Speech enhancement algorithms are applied in multiple levels of enhancement to improve the quality of speech signals under noisy environments known as multi-channel speech enhancement (MCSE) systems. Numerous existing algorithms are used to filter noise in speech enhancement systems, which are typically employed as a pre-processor to reduce noise and improve speech quality. They may, however, be limited in performing well under low signal-to-noise ratio (SNR) situations. The speech devices are exposed to all kinds of environmental noises which may go up to a high-level frequency of noises. The objective of this research is to conduct a noise reduction experiment for a multi-channel speech enhancement (MCSE) system in stationary and non-stationary environmental noisy situations with varying speech signal SNR levels. The experiments examined the performance of the existing and the proposed MCSE systems for environmental noises in filtering low to high SNRs environmental noises (-10 dB to 20 dB). The experiments were conducted using the AURORA and LibriSpeech datasets, which consist of different types of environmental noises. The existing MCSE (BAV-MCSE) makes use of beamforming, adaptive noise reduction and voice activity detection algorithms (BAV) to filter the noises from speech signals. The proposed MCSE (DWT-CNN-MCSE) system was developed based on discrete wavelet transform (DWT) preprocessing and convolution neural network (CNN) for denoising the input noisy speech signals to improve the performance accuracy. The performance of the existing BAV-MCSE and the proposed DWT-CNN-MCSE were measured using spectrogram analysis and word recognition rate (WRR). It was identified that the existing BAV-MCSE reported the highest WRR at 93.77% for a high SNR (at 20 dB) and 5.64% on average for a low SNR (at -10 dB) for different noises. The proposed DWT-CNN-MCSE system has proven to perform well at a low SNR with WRR of 70.55% and the highest improvement (64.91% WRR) at -10 dB SNR.

摘要

语音增强算法应用于多个增强级别,以改善在被称为多通道语音增强(MCSE)系统的噪声环境下的语音信号质量。众多现有算法用于语音增强系统中的噪声滤波,这些算法通常用作预处理器以降低噪声并提高语音质量。然而,它们在低信噪比(SNR)情况下的性能可能会受到限制。语音设备会受到各种环境噪声的影响,这些噪声可能高达高频噪声水平。本研究的目的是针对多通道语音增强(MCSE)系统,在具有不同语音信号SNR水平的平稳和非平稳环境噪声情况下进行降噪实验。实验检验了现有和提出的MCSE系统在过滤低到高SNR环境噪声(-10 dB至20 dB)时对环境噪声的性能。实验使用了包含不同类型环境噪声的AURORA和LibriSpeech数据集进行。现有的MCSE(BAV-MCSE)利用波束形成、自适应降噪和语音活动检测算法(BAV)从语音信号中过滤噪声。提出的MCSE(DWT-CNN-MCSE)系统基于离散小波变换(DWT)预处理和卷积神经网络(CNN)开发,用于对输入的带噪语音信号进行去噪,以提高性能精度。使用频谱图分析和单词识别率(WRR)测量了现有BAV-MCSE和提出的DWT-CNN-MCSE的性能。结果表明,对于不同噪声,现有的BAV-MCSE在高SNR(20 dB)时的WRR最高为93.77%,在低SNR(-10 dB)时平均为5.64%。提出的DWT-CNN-MCSE系统已证明在低SNR时性能良好,WRR为70.55%,在-10 dB SNR时改善最大(WRR为64.91%)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c154/10909157/56caf5f134f8/peerj-cs-10-1901-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验