• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于抗噪声语音识别的贝叶斯估计直方图均衡化

Histogram equalization with Bayesian estimation for noise robust speech recognition.

作者信息

Suh Youngjoo, Kim Hoirin

机构信息

School of Electrical Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea.

出版信息

J Acoust Soc Am. 2018 Feb;143(2):677. doi: 10.1121/1.5022800.

DOI:10.1121/1.5022800
PMID:29495754
Abstract

The histogram equalization approach is an efficient feature normalization technique for noise robust automatic speech recognition. However, it suffers from performance degradation when some fundamental conditions are not satisfied in the test environment. To remedy these limitations of the original histogram equalization methods, class-based histogram equalization approach has been proposed. Although this approach showed substantial performance improvement under noise environments, it still suffers from performance degradation due to the overfitting problem when test data are insufficient. To address this issue, the proposed histogram equalization technique employs the Bayesian estimation method in the test cumulative distribution function estimation. It was reported in a previous study conducted on the Aurora-4 task that the proposed approach provided substantial performance gains in speech recognition systems based on the acoustic modeling of the Gaussian mixture model-hidden Markov model. In this work, the proposed approach was examined in speech recognition systems with deep neural network-hidden Markov model (DNN-HMM), the current mainstream speech recognition approach where it also showed meaningful performance improvement over the conventional maximum likelihood estimation-based method. The fusion of the proposed features with the mel-frequency cepstral coefficients provided additional performance gains in DNN-HMM systems, which otherwise suffer from performance degradation in the clean test condition.

摘要

直方图均衡化方法是一种用于噪声鲁棒自动语音识别的有效特征归一化技术。然而,当测试环境中某些基本条件不满足时,它会出现性能下降的情况。为了弥补原始直方图均衡化方法的这些局限性,已经提出了基于类别的直方图均衡化方法。尽管这种方法在噪声环境下显示出显著的性能提升,但当测试数据不足时,由于过拟合问题,它仍然会出现性能下降的情况。为了解决这个问题,所提出的直方图均衡化技术在测试累积分布函数估计中采用了贝叶斯估计方法。在之前针对Aurora-4任务进行的一项研究中报告称,所提出的方法在基于高斯混合模型-隐马尔可夫模型声学建模的语音识别系统中提供了显著的性能提升。在这项工作中,在所提出的方法在具有深度神经网络-隐马尔可夫模型(DNN-HMM)的语音识别系统中进行了检验,DNN-HMM是当前主流的语音识别方法,在所提出的方法在该系统中也比传统的基于最大似然估计的方法显示出有意义的性能提升。所提出的特征与梅尔频率倒谱系数的融合在DNN-HMM系统中提供了额外的性能提升,否则在干净测试条件下该系统会出现性能下降的情况。

相似文献

1
Histogram equalization with Bayesian estimation for noise robust speech recognition.用于抗噪声语音识别的贝叶斯估计直方图均衡化
J Acoust Soc Am. 2018 Feb;143(2):677. doi: 10.1121/1.5022800.
2
Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting.多流 LSTM-HMM 解码和直方图均衡化用于噪声鲁棒关键词检测。
Cogn Neurodyn. 2011 Sep;5(3):253-64. doi: 10.1007/s11571-011-9166-9. Epub 2011 Aug 9.
3
Recognizing Whispered Speech Produced by an Individual with Surgically Reconstructed Larynx Using Articulatory Movement Data.利用发音运动数据识别接受喉部手术重建的个体所发出的低语语音。
Workshop Speech Lang Process Assist Technol. 2016 Sep;2016:80-86. doi: 10.21437/SLPAT.2016-14.
4
A bio-inspired feature extraction for robust speech recognition.一种用于稳健语音识别的受生物启发的特征提取方法。
Springerplus. 2014 Nov 4;3:651. doi: 10.1186/2193-1801-3-651. eCollection 2014.
5
Improving Acoustic Models in TORGO Dysarthric Speech Database.改善 TORGO 构音障碍语音数据库中的声学模型。
IEEE Trans Neural Syst Rehabil Eng. 2018 Mar;26(3):637-645. doi: 10.1109/TNSRE.2018.2802914.
6
Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training.通过语音分离和联合自适应训练提高深度神经网络声学模型的鲁棒性
IEEE/ACM Trans Audio Speech Lang Process. 2015 Jan;23(1):92-101. doi: 10.1109/TASLP.2014.2372314. Epub 2015 Jan 14.
7
Robust combination of neural networks and hidden Markov models for speech recognition.用于语音识别的神经网络与隐马尔可夫模型的稳健组合。
IEEE Trans Neural Netw. 2003;14(6):1519-31. doi: 10.1109/TNN.2003.820838.
8
Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT.基于深度神经网络的语音识别的多分辨率语音分析:在 TIMIT 上的实验。
PLoS One. 2018 Oct 10;13(10):e0205355. doi: 10.1371/journal.pone.0205355. eCollection 2018.
9
Statistical modeling of speech Poincaré sections in combination of frequency analysis to improve speech recognition performance.联合频率分析的语音庞加莱截面的统计建模以提高语音识别性能。
Chaos. 2010 Sep;20(3):033106. doi: 10.1063/1.3463722.
10
An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition.基于瓶颈特征的语言识别中深度神经网络(DNN)拓扑结构的影响分析
PLoS One. 2017 Aug 10;12(8):e0182580. doi: 10.1371/journal.pone.0182580. eCollection 2017.