• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从梅尔频率倒谱系数预测基频以实现语音重构。

Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction.

作者信息

Shao Xu, Milner Ben

机构信息

School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, United Kingdom.

出版信息

J Acoust Soc Am. 2005 Aug;118(2):1134-43. doi: 10.1121/1.1953269.

DOI:10.1121/1.1953269
PMID:16158667
Abstract

This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-frequency cepstral coefficients (MFCCs) as may be encountered in a distributed speech recognition (DSR) system. Previous methods for speech reconstruction have required, in addition to the MFCC vectors, fundamental frequency and voicing components. In this work the voicing classification and fundamental frequency are predicted from the MFCC vectors themselves using two maximum a posteriori (MAP) methods. The first method enables fundamental frequency prediction by modeling the joint density of MFCCs and fundamental frequency using a single Gaussian mixture model (GMM). The second scheme uses a set of hidden Markov models (HMMs) to link together a set of state-dependent GMMs, which enables a more localized modeling of the joint density of MFCCs and fundamental frequency. Experimental results on speaker-independent male and female speech show that accurate voicing classification and fundamental frequency prediction is attained when compared to hand-corrected reference fundamental frequency measurements. The use of the predicted fundamental frequency and voicing for speech reconstruction is shown to give very similar speech quality to that obtained using the reference fundamental frequency and voicing.

摘要

这项工作提出了一种仅从分布式语音识别(DSR)系统中可能遇到的梅尔频率倒谱系数(MFCC)流重建声学语音信号的方法。以往的语音重建方法除了需要MFCC向量外,还需要基频和清音浊音成分。在这项工作中,使用两种最大后验(MAP)方法从MFCC向量本身预测清音浊音分类和基频。第一种方法通过使用单个高斯混合模型(GMM)对MFCC和基频的联合密度进行建模来实现基频预测。第二种方案使用一组隐马尔可夫模型(HMM)将一组与状态相关的GMM链接在一起,从而能够对MFCC和基频的联合密度进行更局部的建模。与人工校正的参考基频测量结果相比,针对独立于说话者的男性和女性语音的实验结果表明,能够实现准确的清音浊音分类和基频预测。结果表明,使用预测的基频和清音浊音进行语音重建,所得到的语音质量与使用参考基频和清音浊音所获得的语音质量非常相似。

相似文献

1
Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction.从梅尔频率倒谱系数预测基频以实现语音重构。
J Acoust Soc Am. 2005 Aug;118(2):1134-43. doi: 10.1121/1.1953269.
2
Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures.分布式语音识别架构中基于梅尔频率倒谱系数的声学语音特征分析与预测
J Acoust Soc Am. 2008 Dec;124(6):3989-4000. doi: 10.1121/1.2997436.
3
Classification of stop place in consonant-vowel contexts using feature extrapolation of acoustic-phonetic features in telephone speech.使用电话语音的声学-语音特征的特征外推在辅音-元音环境中对停顿位置进行分类。
J Acoust Soc Am. 2012 Feb;131(2):1536-46. doi: 10.1121/1.3672706.
4
Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?长期声学-语音特征和梅尔频率倒谱系数是否为法医语音比较提供了互补的说话人特异性信息?
Forensic Sci Int. 2024 Oct;363:112199. doi: 10.1016/j.forsciint.2024.112199. Epub 2024 Aug 22.
5
Analysis of acoustic parameters for consonant voicing classification in clean and telephone speech.对清音和电话语音中的浊辅音声学参数进行分析。
J Acoust Soc Am. 2012 Mar;131(3):EL197-202. doi: 10.1121/1.3678667.
6
Effect of fundamental frequency on medial [+voice]/[-voice] judgments.基频对中间[+嗓音]/[-嗓音]判断的影响。
Phonetica. 1995;52(3):188-95. doi: 10.1159/000262170.
7
Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition.在自动语音识别中利用人为因素倒谱系数的独立滤波器带宽
J Acoust Soc Am. 2004 Sep;116(3):1774-80. doi: 10.1121/1.1777872.
8
Influence of pre-voicing duration on dichotic performance.预发声持续时间对双耳分听表现的影响。
Audiology. 1983;22(2):162-6. doi: 10.3109/00206098309072778.
9
Static features in real-time recognition of isolated vowels at high pitch.高音调孤立元音实时识别中的静态特征
J Acoust Soc Am. 2007 Oct;122(4):2389-404. doi: 10.1121/1.2772228.
10
Automatic classification and speaker identification of African elephant (Loxodonta africana) vocalizations.非洲象(Loxodonta africana)发声的自动分类与说话者识别。
J Acoust Soc Am. 2005 Feb;117(2):956-63. doi: 10.1121/1.1847850.