Suppr超能文献

用于自动语音识别的多通道语音增强:文献综述

Multichannel speech enhancement for automatic speech recognition: a literature review.

作者信息

Zaland Zubair, Mustafa Mumtaz Begum, Mat Kiah Miss Laiha, Ting Hua-Nong, Mohamed Yusoof Mansoor Ali, Mohd Don Zuraidah, Muthaiyah Saravanan

机构信息

Department of Software Engineering, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia.

Department of Computer System and Technology, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia.

出版信息

PeerJ Comput Sci. 2025 Mar 27;11:e2772. doi: 10.7717/peerj-cs.2772. eCollection 2025.

Abstract

Multichannel speech enhancement (MCSE) is crucial for improving the robustness and accuracy of automatic speech recognition (ASR) systems. Due to the importance of ASR systems, extensive research has been conducted in MCSE, leading to rapid advancements in methods, models, and datasets. Most previous reviews point to the lack of a systematic literature review of MCSE for ASR systems. This systematic literature review aims to (1) perform a comprehensive review of the existing approaches in MCSE for ASR, (2) analyze the performance of the MCSE and ASR for various techniques, models, as well as noise data and environments, and (3) discuss the challenges, limitations, and future research directions in this research area. We conducted keyword searches on several electronic databases such as Google Scholar, IEEE Xplore, ScienceDirect, SpringerLink, ACM Digital Library, and ISI Web of Knowledge to identify relevant journal and conference articles. We selected 240 articles based on inclusion criteria from the initial search results and ended with 35 experimental articles when exclusion criteria were applied. Through backward snowballing and the quality assessment, the final tally was 40 articles, comprising 23 journals, and 17 conference articles. The review shows that there is an increasing trend in MCSE for ASR with word error rate (WER), perceptual evaluation of speech quality (PESQ), and short-time objective intelligence (STOI) as common forms of performance measures. One of the major issues that we found in the review is the generality and comparability of the MCSE works, making it difficult to come up with unified solutions to noises in speech recognition. This systematic literature review has extensively examined MCSE and ASR techniques. Key findings include identifying MCSE methods that help ASR performance across various models, techniques, noise, and environments. We also identify several key areas researchers can explore in the future due to their promising potential.

摘要

多通道语音增强(MCSE)对于提高自动语音识别(ASR)系统的鲁棒性和准确性至关重要。由于ASR系统的重要性,在MCSE领域已经进行了广泛的研究,从而在方法、模型和数据集方面取得了快速进展。以前的大多数综述都指出,缺乏针对ASR系统的MCSE的系统文献综述。本系统文献综述旨在:(1)对ASR的MCSE中现有方法进行全面综述;(2)分析MCSE和ASR在各种技术、模型以及噪声数据和环境下的性能;(3)讨论该研究领域的挑战、局限性和未来研究方向。我们在谷歌学术、IEEE Xplore、ScienceDirect、SpringerLink、ACM数字图书馆和ISI Web of Knowledge等多个电子数据库上进行了关键词搜索,以识别相关的期刊和会议文章。我们根据纳入标准从初始搜索结果中选择了240篇文章,在应用排除标准后最终得到35篇实验性文章。通过反向滚雪球法和质量评估,最终统计为40篇文章,其中包括23篇期刊文章和17篇会议文章。综述表明,以单词错误率(WER)、语音质量感知评估(PESQ)和短时客观清晰度(STOI)作为常见性能指标形式的用于ASR的MCSE呈上升趋势。我们在综述中发现的一个主要问题是MCSE工作的通用性和可比性,这使得难以提出针对语音识别中的噪声的统一解决方案。本系统文献综述对MCSE和ASR技术进行了广泛研究。主要发现包括识别有助于在各种模型、技术、噪声和环境下提升ASR性能的MCSE方法。我们还确定了几个研究人员未来可以探索的关键领域,因为它们具有很大的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feb9/12190416/867e20c768ac/peerj-cs-11-2772-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验