Suppr超能文献

通过线性预测和手工方法估计复制合成的语音频谱。

Estimating speech spectra for copy synthesis by linear prediction and by hand.

机构信息

Department of Psychology, Barnard College, Columbia University, New York, New York 10027, USA.

出版信息

J Acoust Soc Am. 2011 Oct;130(4):2173-8. doi: 10.1121/1.3631667.

Abstract

Linear prediction is a widely available technique for analyzing acoustic properties of speech, although this method is known to be error-prone. New tests assessed the adequacy of linear prediction estimates by using this method to derive synthesis parameters and testing the intelligibility of the synthetic speech that results. Matched sets of sine-wave sentences were created, one set using uncorrected linear prediction estimates of natural sentences, the other using estimates made by hand. Phoneme restrictions imposed on linguistic properties allowed comparisons between continuous and intermittent voicing, oral or nasal and fricative manner, and unrestricted phonemic variation. Intelligibility tests revealed uniformly good performance with sentences created by hand-estimation and a minimal decrease in intelligibility with estimation by linear prediction due to manner variation with continuous voicing. Poorer performance was observed when linear prediction estimates were used to produce synthetic versions of phonemically unrestricted sentences, but no similar decline was observed with synthetic sentences produced by hand estimation. The results show a substantial intelligibility cost of reliance on uncorrected linear prediction estimates when phonemic variation approaches natural incidence.

摘要

线性预测是一种广泛应用于分析语音声学特性的技术,但这种方法已知存在误差。新的测试通过使用这种方法来推导合成参数,并测试由此产生的合成语音的可理解性,来评估线性预测估计的充分性。创建了一组匹配的正弦波句子,一组使用未经校正的自然句子的线性预测估计,另一组使用手动估计。对语言属性施加的音位限制允许对连续和间歇发声、口腔或鼻腔和摩擦方式以及不受限制的音位变化进行比较。使用手动估计创建的句子的可理解性测试结果始终良好,由于连续发声方式的变化,线性预测导致的可理解性略有下降。当使用线性预测估计来生成音位不受限制的句子的合成版本时,观察到较差的性能,但使用手动估计生成的合成句子没有观察到类似的下降。结果表明,当音位变化接近自然发生率时,依赖未经校正的线性预测估计会带来相当大的可理解性成本。

相似文献

引用本文的文献

1
Primitive audiovisual integration of speech.言语的原始视听整合
Atten Percept Psychophys. 2025 May;87(4):1353-1364. doi: 10.3758/s13414-025-03038-1. Epub 2025 Mar 7.
2
SHORT-TERM PERCEPTUAL TUNING TO TALKER CHARACTERISTICS.对说话者特征的短期感知调整
Lang Cogn Neurosci. 2018;33(9):1083-1091. doi: 10.1080/23273798.2018.1442580. Epub 2018 Feb 26.
8
9
Modulation sensitivity in the perceptual organization of speech.言语感知组织中的调制敏感性。
Atten Percept Psychophys. 2013 Oct;75(7):1353-8. doi: 10.3758/s13414-013-0542-x.

本文引用的文献

3
Phonetic recalibration only occurs in speech mode.语音重新校准仅在语音模式下发生。
Cognition. 2009 Feb;110(2):254-9. doi: 10.1016/j.cognition.2008.10.015. Epub 2008 Dec 6.
9
Talker identification based on phonetic information.基于语音信息的说话人识别
J Exp Psychol Hum Percept Perform. 1997 Jun;23(3):651-66. doi: 10.1037//0096-1523.23.3.651.
10
Estimation of formant frequencies in infant cry.婴儿哭声中共振峰频率的估计。
Int J Pediatr Otorhinolaryngol. 1995 Apr;32(1):57-67. doi: 10.1016/0165-5876(94)01112-b.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验