Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, UK.
Proc Biol Sci. 2011 May 22;278(1711):1595-600. doi: 10.1098/rspb.2010.1554. Epub 2010 Nov 10.
Noise-vocoded (NV) speech is often regarded as conveying phonetic information primarily through temporal-envelope cues rather than spectral cues. However, listeners may infer the formant frequencies in the vocal-tract output-a key source of phonetic detail-from across-band differences in amplitude when speech is processed through a small number of channels. The potential utility of this spectral information was assessed for NV speech created by filtering sentences into six frequency bands, and using the amplitude envelope of each band (≤30 Hz) to modulate a matched noise-band carrier (N). Bands were paired, corresponding to F1 (≈N1 + N2), F2 (≈N3 + N4) and the higher formants (F3' ≈ N5 + N6), such that the frequency contour of each formant was implied by variations in relative amplitude between bands within the corresponding pair. Three-formant analogues (F0 = 150 Hz) of the NV stimuli were synthesized using frame-by-frame reconstruction of the frequency and amplitude of each formant. These analogues were less intelligible than the NV stimuli or analogues created using contours extracted from spectrograms of the original sentences, but more intelligible than when the frequency contours were replaced with constant (mean) values. Across-band comparisons of amplitude envelopes in NV speech can provide phonetically important information about the frequency contours of the underlying formants.
噪声编码(NV)语音通常被认为主要通过时域包络线索而不是频谱线索来传达语音信息。然而,当语音通过少量通道进行处理时,听众可能会从幅度的跨频带差异中推断出声道输出中的共振峰频率——这是语音细节的关键来源。评估了这种谱信息在 NV 语音中的潜在效用,方法是将句子过滤成六个频带,并使用每个频带的幅度包络(≤30 Hz)来调制匹配的噪声频带载波(N)。频带配对,对应于 F1(≈N1 + N2)、F2(≈N3 + N4)和较高的共振峰(F3' ≈ N5 + N6),使得每个共振峰的频率轮廓由相应对中各频带之间的相对幅度变化暗示。使用每个共振峰的频率和幅度的逐帧重建,合成了 NV 刺激的三共振峰模拟(F0 = 150 Hz)。这些模拟的可懂度不如 NV 刺激或从原始句子的语谱图中提取的轮廓创建的模拟,但比用恒定(均值)值替换频率轮廓时更可懂。NV 语音中幅度包络的跨频带比较可以提供有关基础共振峰频率轮廓的语音重要信息。