语音频谱在三个时变正弦波中的编码。

Coding of the speech spectrum in three time-varying sinusoids.

作者信息

Remez R E, Rubin P E, Pisoni D B

出版信息

Ann N Y Acad Sci. 1983;405:485-9. doi: 10.1111/j.1749-6632.1983.tb31663.x.

DOI:10.1111/j.1749-6632.1983.tb31663.x

Abstract

Recent perceptual experiments with normal adult listeners show that phonetic information can readily be conveyed by sinewave replicas of speech signals. These tonal patterns are made of three sinusoids set equal in frequency and amplitude to the respective peaks of the first three formants of natural-speech utterances. Unlike natural and most synthetic speech, the spectrum of sinusoidal patterns contains neither harmonics nor broadband formants, and is identified as grossly unnatural in voice timbre. Despite this drastic recoding of the short-time speech spectrum, listeners perceive the phonetic content if the temporal properties of spectrum variation are preserved. These observations suggest that phonetic perception may depend on properties of coherent spectrum variation, a second-order property of the acoustic signal, rather than any particular set of acoustic elements present in speech signals.

摘要

近期针对正常成年听众的感知实验表明，语音信号的正弦波复制品能够轻松传达语音信息。这些音调模式由三个正弦波组成，其频率和幅度分别与自然语音话语的前三个共振峰的各自峰值相等。与自然语音和大多数合成语音不同，正弦波模式的频谱既不包含谐波也不包含宽带共振峰，并且在语音音色上被认为是极不自然的。尽管对短时语音频谱进行了如此彻底的重新编码，但如果频谱变化的时间特性得以保留，听众仍能感知语音内容。这些观察结果表明，语音感知可能取决于连贯频谱变化的特性，这是声学信号的二阶特性，而非语音信号中存在的任何特定声学元素集合。