• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Time-Domain Speech Enhancement for Robust Automatic Speech Recognition.用于稳健自动语音识别的时域语音增强
Interspeech. 2023 Aug;2023:4913-4917. doi: 10.21437/interspeech.2023-167.
2
Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training.通过语音分离和联合自适应训练提高深度神经网络声学模型的鲁棒性
IEEE/ACM Trans Audio Speech Lang Process. 2015 Jan;23(1):92-101. doi: 10.1109/TASLP.2014.2372314. Epub 2015 Jan 14.
3
Two-Step Joint Optimization with Auxiliary Loss Function for Noise-Robust Speech Recognition.两步联合优化与辅助损失函数的噪声鲁棒语音识别。
Sensors (Basel). 2022 Jul 19;22(14):5381. doi: 10.3390/s22145381.
4
Matrix sentence intelligibility prediction using an automatic speech recognition system.使用自动语音识别系统进行矩阵句子可懂度预测。
Int J Audiol. 2015;54 Suppl 2:100-7. doi: 10.3109/14992027.2015.1061708. Epub 2015 Sep 18.
5
Real-time Controlling Dynamics Sensing in Air Traffic System.空中交通系统中的实时控制动态感应。
Sensors (Basel). 2019 Feb 7;19(3):679. doi: 10.3390/s19030679.
6
Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.基于机器学习的方言阿萨姆语语音自动识别样本提取。
Neural Netw. 2016 Jun;78:97-111. doi: 10.1016/j.neunet.2015.12.010. Epub 2015 Dec 30.
7
Cluster-Based Pairwise Contrastive Loss for Noise-Robust Speech Recognition.用于抗噪声语音识别的基于聚类的成对对比损失
Sensors (Basel). 2024 Apr 17;24(8):2573. doi: 10.3390/s24082573.
8
On training targets for deep learning approaches to clean speech magnitude spectrum estimation.深度学习方法在语音谱估计中对训练目标的研究。
J Acoust Soc Am. 2021 May;149(5):3273. doi: 10.1121/10.0004823.
9
Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.用于单通道和多通道语音增强及稳健自动语音识别的复杂谱映射
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1778-1787. doi: 10.1109/taslp.2020.2998279. Epub 2020 May 28.
10
Multiexpert automatic speech recognition using acoustic and myoelectric signals.使用声学和肌电信号的多专家自动语音识别
IEEE Trans Biomed Eng. 2006 Apr;53(4):676-85. doi: 10.1109/TBME.2006.870224.

本文引用的文献

1
Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.用于语音增强以提高跨语料库泛化能力的自关注循环神经网络
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:1374-1385. doi: 10.1109/taslp.2022.3161143. Epub 2022 Mar 22.
2
Dense CNN with Self-Attention for Time-Domain Speech Enhancement.用于时域语音增强的带自注意力机制的密集卷积神经网络
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1270-1279. doi: 10.1109/taslp.2021.3064421. Epub 2021 Mar 8.
3
Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.用于单通道和多通道语音增强及稳健自动语音识别的复杂谱映射
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1778-1787. doi: 10.1109/taslp.2020.2998279. Epub 2020 May 28.
4
Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.用于单声道语音增强的带扩张卷积的门控残差网络
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):189-198. doi: 10.1109/TASLP.2018.2876171. Epub 2018 Oct 15.
5
Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离:综述
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.
6
An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech.基于时频加权噪声语音可懂度预测的客观测量评估。
J Acoust Soc Am. 2011 Nov;130(5):3013-27. doi: 10.1121/1.3641373.

用于稳健自动语音识别的时域语音增强

Time-Domain Speech Enhancement for Robust Automatic Speech Recognition.

作者信息

Yang Yufeng, Pandey Ashutosh, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, USA.

Center for Cognitive and Brain Sciences, The Ohio State University, USA.

出版信息

Interspeech. 2023 Aug;2023:4913-4917. doi: 10.21437/interspeech.2023-167.

DOI:10.21437/interspeech.2023-167
PMID:40313476
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12045131/
Abstract

It has been shown that the intelligibility of noisy speech can be improved by speech enhancement algorithms. However, speech enhancement has not been established as an effective frontend for robust automatic speech recognition (ASR) in noisy conditions compared to an ASR model trained on noisy speech directly. The divide between speech enhancement and ASR impedes the progress of robust ASR systems especially as speech enhancement has made big strides in recent years. In this work, we focus on eliminating this divide with an ARN (attentive recurrent network) based time-domain enhancement model. The proposed system fully decouples speech enhancement and an acoustic model trained only on clean speech. Results on the CHiME-2 corpus show that ARN enhanced speech translates to improved ASR results. The proposed system achieves 6.28% average word error rate, outperforming the previous best by 19.3% relatively.

摘要

研究表明,语音增强算法可以提高噪声环境下语音的清晰度。然而,与直接在噪声语音上训练的自动语音识别(ASR)模型相比,语音增强尚未成为在噪声条件下实现鲁棒自动语音识别的有效前端。语音增强和ASR之间的脱节阻碍了鲁棒ASR系统的发展,特别是近年来语音增强已经取得了长足的进步。在这项工作中,我们专注于用基于注意力循环网络(ARN)的时域增强模型消除这种脱节。所提出的系统将语音增强与仅在干净语音上训练的声学模型完全解耦。在CHiME-2语料库上的结果表明,ARN增强的语音转化为了更好的ASR结果。所提出的系统实现了6.28%的平均字错误率,相对比之前的最佳结果提高了19.3%。