• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

分布式语音识别架构中基于梅尔频率倒谱系数的声学语音特征分析与预测

Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures.

作者信息

Darch Jonathan, Milner Ben, Vaseghi Saeed

机构信息

School of Computing Sciences, University of East Anglia, Norwich, United Kingdom.

出版信息

J Acoust Soc Am. 2008 Dec;124(6):3989-4000. doi: 10.1121/1.2997436.

DOI:10.1121/1.2997436
PMID:19206822
Abstract

The aim of this work is to develop methods that enable acoustic speech features to be predicted from mel-frequency cepstral coefficient (MFCC) vectors as may be encountered in distributed speech recognition architectures. The work begins with a detailed analysis of the multiple correlation between acoustic speech features and MFCC vectors. This confirms the existence of correlation, which is found to be higher when measured within specific phonemes rather than globally across all speech sounds. The correlation analysis leads to the development of a statistical method of predicting acoustic speech features from MFCC vectors that utilizes a network of hidden Markov models (HMMs) to localize prediction to specific phonemes. Within each HMM, the joint density of acoustic features and MFCC vectors is modeled and used to make a maximum a posteriori prediction. Experimental results are presented across a range of conditions, such as with speaker-dependent, gender-dependent, and gender-independent constraints, and these show that acoustic speech features can be predicted from MFCC vectors with good accuracy. A comparison is also made against an alternative scheme that substitutes the higher-order MFCCs with acoustic features for transmission. This delivers accurate acoustic features but at the expense of a significant reduction in speech recognition accuracy.

摘要

这项工作的目的是开发一些方法,以便能够从分布式语音识别架构中可能遇到的梅尔频率倒谱系数(MFCC)向量预测声学语音特征。这项工作首先对声学语音特征与MFCC向量之间的多重相关性进行了详细分析。这证实了相关性的存在,发现在特定音素内测量时相关性更高,而不是在所有语音的全局范围内测量。相关性分析导致了一种从MFCC向量预测声学语音特征的统计方法的发展,该方法利用隐马尔可夫模型(HMM)网络将预测定位到特定音素。在每个HMM中,对声学特征和MFCC向量的联合密度进行建模,并用于进行最大后验预测。给出了在一系列条件下的实验结果,例如在与说话者相关、与性别相关和与性别无关的约束条件下,这些结果表明可以从MFCC向量中准确地预测声学语音特征。还与一种替代方案进行了比较,该方案用声学特征替代高阶MFCC进行传输。这能提供准确的声学特征,但代价是语音识别准确率显著降低。

相似文献

1
Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures.分布式语音识别架构中基于梅尔频率倒谱系数的声学语音特征分析与预测
J Acoust Soc Am. 2008 Dec;124(6):3989-4000. doi: 10.1121/1.2997436.
2
Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction.从梅尔频率倒谱系数预测基频以实现语音重构。
J Acoust Soc Am. 2005 Aug;118(2):1134-43. doi: 10.1121/1.1953269.
3
Automatic recognition of pathological phoneme production.病理性音素产生的自动识别。
Folia Phoniatr Logop. 2008;60(6):323-31. doi: 10.1159/000170083. Epub 2008 Nov 11.
4
Static features in real-time recognition of isolated vowels at high pitch.高音调孤立元音实时识别中的静态特征
J Acoust Soc Am. 2007 Oct;122(4):2389-404. doi: 10.1121/1.2772228.
5
Classification of stop place in consonant-vowel contexts using feature extrapolation of acoustic-phonetic features in telephone speech.使用电话语音的声学-语音特征的特征外推在辅音-元音环境中对停顿位置进行分类。
J Acoust Soc Am. 2012 Feb;131(2):1536-46. doi: 10.1121/1.3672706.
6
Modeling the temporal dynamics of distinctive feature landmark detectors for speech recognition.为语音识别建模独特特征界标检测器的时间动态。
J Acoust Soc Am. 2008 Sep;124(3):1739-58. doi: 10.1121/1.2956472.
7
A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition.一种基于语音特征的地标检测概率框架,用于自动语音识别。
J Acoust Soc Am. 2008 Feb;123(2):1154-68. doi: 10.1121/1.2823754.
8
Speaker normalization using cortical strip maps: a neural model for steady-state vowel categorization.使用皮质带图的说话者归一化:一种用于稳态元音分类的神经模型。
J Acoust Soc Am. 2008 Dec;124(6):3918-36. doi: 10.1121/1.2997478.
9
Auditory-model based robust feature selection for speech recognition.基于听觉模型的语音识别鲁棒特征选择。
J Acoust Soc Am. 2010 Feb;127(2):EL73-9. doi: 10.1121/1.3284545.
10
Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition.在自动语音识别中利用人为因素倒谱系数的独立滤波器带宽
J Acoust Soc Am. 2004 Sep;116(3):1774-80. doi: 10.1121/1.1777872.

引用本文的文献

1
A novel approach for acoustic estimation of neck fluid volume between men and women.一种用于估计男女颈部液体量的新方法。
Med Biol Eng Comput. 2018 Jan;56(1):113-123. doi: 10.1007/s11517-017-1675-1. Epub 2017 Jul 5.