Suppr超能文献

基于卷积神经网络(CNN)的多通道语音增强系统的降噪方法,采用离散小波变换(DWT)预处理。

CNN-based noise reduction for multi-channel speech enhancement system with discrete wavelet transform (DWT) preprocessing.

作者信息

Cherukuru Pavani, Mustafa Mumtaz Begum

机构信息

Department of Software Engineering, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia.

Department of Information Science, Dayananda Sagar Academy of Technology and Management, Bangalore, Karnataka, India.

出版信息

PeerJ Comput Sci. 2024 Feb 28;10:e1901. doi: 10.7717/peerj-cs.1901. eCollection 2024.

Abstract

Speech enhancement algorithms are applied in multiple levels of enhancement to improve the quality of speech signals under noisy environments known as multi-channel speech enhancement (MCSE) systems. Numerous existing algorithms are used to filter noise in speech enhancement systems, which are typically employed as a pre-processor to reduce noise and improve speech quality. They may, however, be limited in performing well under low signal-to-noise ratio (SNR) situations. The speech devices are exposed to all kinds of environmental noises which may go up to a high-level frequency of noises. The objective of this research is to conduct a noise reduction experiment for a multi-channel speech enhancement (MCSE) system in stationary and non-stationary environmental noisy situations with varying speech signal SNR levels. The experiments examined the performance of the existing and the proposed MCSE systems for environmental noises in filtering low to high SNRs environmental noises (-10 dB to 20 dB). The experiments were conducted using the AURORA and LibriSpeech datasets, which consist of different types of environmental noises. The existing MCSE (BAV-MCSE) makes use of beamforming, adaptive noise reduction and voice activity detection algorithms (BAV) to filter the noises from speech signals. The proposed MCSE (DWT-CNN-MCSE) system was developed based on discrete wavelet transform (DWT) preprocessing and convolution neural network (CNN) for denoising the input noisy speech signals to improve the performance accuracy. The performance of the existing BAV-MCSE and the proposed DWT-CNN-MCSE were measured using spectrogram analysis and word recognition rate (WRR). It was identified that the existing BAV-MCSE reported the highest WRR at 93.77% for a high SNR (at 20 dB) and 5.64% on average for a low SNR (at -10 dB) for different noises. The proposed DWT-CNN-MCSE system has proven to perform well at a low SNR with WRR of 70.55% and the highest improvement (64.91% WRR) at -10 dB SNR.

摘要

语音增强算法应用于多个增强级别,以改善在被称为多通道语音增强(MCSE)系统的噪声环境下的语音信号质量。众多现有算法用于语音增强系统中的噪声滤波,这些算法通常用作预处理器以降低噪声并提高语音质量。然而,它们在低信噪比(SNR)情况下的性能可能会受到限制。语音设备会受到各种环境噪声的影响,这些噪声可能高达高频噪声水平。本研究的目的是针对多通道语音增强(MCSE)系统,在具有不同语音信号SNR水平的平稳和非平稳环境噪声情况下进行降噪实验。实验检验了现有和提出的MCSE系统在过滤低到高SNR环境噪声(-10 dB至20 dB)时对环境噪声的性能。实验使用了包含不同类型环境噪声的AURORA和LibriSpeech数据集进行。现有的MCSE(BAV-MCSE)利用波束形成、自适应降噪和语音活动检测算法(BAV)从语音信号中过滤噪声。提出的MCSE(DWT-CNN-MCSE)系统基于离散小波变换(DWT)预处理和卷积神经网络(CNN)开发,用于对输入的带噪语音信号进行去噪,以提高性能精度。使用频谱图分析和单词识别率(WRR)测量了现有BAV-MCSE和提出的DWT-CNN-MCSE的性能。结果表明,对于不同噪声,现有的BAV-MCSE在高SNR(20 dB)时的WRR最高为93.77%,在低SNR(-10 dB)时平均为5.64%。提出的DWT-CNN-MCSE系统已证明在低SNR时性能良好,WRR为70.55%,在-10 dB SNR时改善最大(WRR为64.91%)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c154/10909157/56caf5f134f8/peerj-cs-10-1901-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验