语音的频谱恢复：通过在频谱间隙中插入噪声来提高可懂度。

Spectral restoration of speech: intelligibility is increased by inserting noise in spectral gaps.

作者信息

Warren R M, Hainsworth K R, Brubaker B S, Bashford J A, Healy E W

机构信息

Department of Psychology, University of Wisconsin-Milwaukee 53201, USA.

出版信息

Percept Psychophys. 1997 Feb;59(2):275-83. doi: 10.3758/bf03211895.

DOI:10.3758/bf03211895

PMID:9055622

Abstract

In order to function effectively as a means of communication, speech must be intelligible under the noisy conditions encountered in everyday life. Two types of perceptual synthesis have been reported that can reduce or cancel the effects of masking by extraneous sounds: Phonemic restoration can enhance intelligibility when segments are replaced or masked by noise, and contralateral induction can prevent mislateralization by effectively restoring speech masked at one ear when it is heard in the other. The present study reports a third type of perceptual synthesis induced by noise: enhancement of intelligibility produced by adding noise to spectral gaps. In most of the experiments, the speech stimuli consisted of two widely separated narrow bands of speech (center frequencies of 370 and 6,000 Hz, each band having high-pass and low-pass slopes of 115 dB/octave meeting at the center frequency). These very narrow bands effectively reduced the available information to frequency-limited patterns of amplitude fluctuation lacking information concerning formant structure and frequency transitions. When stochastic noise was introduced into the gap separating the two speech bands, intelligibility increased for "everyday" sentences, for sentences that varied in the transitional probability of keywords, and for monosyllabic word lists. Effects produced by systematically varying noise amplitude and noise bandwidth are reported, and the implications of some of the novel effects observed are discussed.

摘要

为了有效地作为一种交流手段发挥作用，语音在日常生活中遇到的嘈杂环境下必须是可理解的。据报道，有两种类型的感知合成可以减少或消除外界声音的掩蔽效应：当语音片段被噪声替换或掩蔽时，音素恢复可以提高可懂度；当在一只耳朵听到的语音被掩蔽时，对侧诱导可以通过有效地恢复该语音来防止错误的侧向化。本研究报告了由噪声诱导的第三种类型的感知合成：通过向频谱间隙添加噪声来提高可懂度。在大多数实验中，语音刺激由两个相距很远的窄带语音组成（中心频率分别为370和6000赫兹，每个频段在中心频率处具有115分贝/倍频程的高通和低通斜率）。这些非常窄的频段有效地将可用信息减少到缺乏有关共振峰结构和频率转换信息的幅度波动的频率受限模式。当将随机噪声引入分隔两个语音频段的间隙时，对于“日常”句子、关键词过渡概率不同的句子以及单音节词列表，可懂度都会提高。报告了系统改变噪声幅度和噪声带宽所产生的影响，并讨论了观察到的一些新效应的含义。