• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Detection of speech landmarks: use of temporal information.

作者信息

Salomon Ariel, Espy-Wilson Carol Y, Deshmukh Om

机构信息

Electrical and Computer Engineering Department, University of Maryland, A. V. Williams Building, College Park, Maryland 20742, USA.

出版信息

J Acoust Soc Am. 2004 Mar;115(3):1296-305. doi: 10.1121/1.1646400.

DOI:10.1121/1.1646400
PMID:15058352
Abstract

Studies by Shannon et al. [Science, 270, 303-304 (1995)], Van Tasell et al. [J. Acoust. Soc. Am. 82, 1152-1161 (1987)], and others show that human listeners can understand important aspects of the speech signal when spectral shape has been significantly degraded. These experiments suggest that temporal information is particularly important in human speech perception when the speech signal is heavily degraded. In this study, a system is developed that extracts linguistically relevant temporal information that can be used in the front end of an automatic speech recognition system. The parameters targeted include energy onset and offsets (computed using an adaptive algorithm) and measures of periodic and aperiodic content; together these are used to find abrupt acoustic events which signify landmarks. Overall detection rates for strongly robust events, robust events, and weak events in a portion of the TIMIT test database are 98.9%, 94.7%, and 52.1%, respectively. Error rates increase by less than 5% when the speech signals are spectrally impoverished. Use of the four temporal parameters as the front end of a hidden Markov model (HMM)-based system for the automatic recognition of the manner classes "sonorant," "fricative," "stop," and "silence" results in the same recognition accuracy achieved when the standard 39 cepstral-based parameters are used, 70.1%. The combination of the temporal parameters and cepstral parameters results in an accuracy of 74.8%.

摘要

相似文献

1
Detection of speech landmarks: use of temporal information.
J Acoust Soc Am. 2004 Mar;115(3):1296-305. doi: 10.1121/1.1646400.
2
A model of auditory perception as front end for automatic speech recognition.一种作为自动语音识别前端的听觉感知模型。
J Acoust Soc Am. 1999 Oct;106(4 Pt 1):2040-50. doi: 10.1121/1.427950.
3
Focus, prosodic context, and phonological feature specification: patterns of variation in fricative production.焦点、韵律语境与音系特征规范:擦音发音的变异模式
J Acoust Soc Am. 2008 May;123(5):2769-79. doi: 10.1121/1.2890736.
4
A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition.一种基于语音特征的地标检测概率框架,用于自动语音识别。
J Acoust Soc Am. 2008 Feb;123(2):1154-68. doi: 10.1121/1.2823754.
5
Classification of stop place in consonant-vowel contexts using feature extrapolation of acoustic-phonetic features in telephone speech.使用电话语音的声学-语音特征的特征外推在辅音-元音环境中对停顿位置进行分类。
J Acoust Soc Am. 2012 Feb;131(2):1536-46. doi: 10.1121/1.3672706.
6
Nonlinear spectro-temporal features based on a cochlear model for automatic speech recognition in a noisy situation.基于耳蜗模型的非线性谱时特征在噪声环境下的自动语音识别。
Neural Netw. 2013 Sep;45:62-9. doi: 10.1016/j.neunet.2013.02.006. Epub 2013 Mar 7.
7
Acoustic-phonetic features for the automatic classification of fricatives.用于擦音自动分类的声学语音特征。
J Acoust Soc Am. 2001 May;109(5 Pt 1):2217-35. doi: 10.1121/1.1357814.
8
Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.可分离的频谱-时间Gabor滤波器组特征:降低用于自动语音识别的稳健特征的复杂度。
J Acoust Soc Am. 2015 Apr;137(4):2047-59. doi: 10.1121/1.4916618.
9
The relative importance of spectral tilt in monophthongs and diphthongs.单元音和双元音中频谱倾斜的相对重要性。
J Acoust Soc Am. 2005 Mar;117(3 Pt 1):1395-404. doi: 10.1121/1.1861158.
10
Word recognition for temporally and spectrally distorted materials: the effects of age and hearing loss.语音识别对时间和频谱失真材料的影响:年龄和听力损失的作用。
Ear Hear. 2012 May-Jun;33(3):349-66. doi: 10.1097/AUD.0b013e318242571c.

引用本文的文献

1
A speech envelope landmark for syllable encoding in human superior temporal gyrus.人类上颞回中用于音节编码的言语包络地标。
Sci Adv. 2019 Nov 20;5(11):eaay6279. doi: 10.1126/sciadv.aay6279. eCollection 2019 Nov.
2
Contribution of consonant landmarks to speech recognition in simulated acoustic-electric hearing.辅音地标对模拟电声听觉中的语音识别的贡献。
Ear Hear. 2010 Apr;31(2):259-67. doi: 10.1097/AUD.0b013e3181c7db17.