Başkent Deniz, Shannon Robert V
Department of Biomedical Engineering, University of Southern California, Los Angeles, USA.
Ear Hear. 2007 Jun;28(3):277-89. doi: 10.1097/AUD.0b013e318050d398.
To explore combined acute effects of frequency shift and compression-expansion on speech recognition, using noiseband vocoder processing.
Recognition of vowels and consonants, processed with a noiseband vocoder, was measured with five normal-hearing subjects, between the ages of 27 and 35 yr. The speech signal was filtered into 8 or 16 analysis bands and the envelopes were extracted from each band. The carrier noise bands were modulated by the envelopes and resynthesized to produce the processed speech. In the baseline matched condition, the frequency ranges of the corresponding analysis and carrier bands were the same. In the shift only condition, the frequency ranges of the carrier bands were shifted up or down relative to the analysis bands. In the compression and expansion only conditions, the analysis band range was made larger or smaller, respectively, than the carrier band range. By applying the shift to carrier bands and compression or expansion to analysis bands simultaneously, the combined effects of the two spectral distortions on speech recognition were explored.
When the spectral distortions of compression-expansion or shift were applied separately, the performance was reduced from the baseline matched condition. However, when the two spectral degradations were applied simultaneously, a compensatory effect was observed; the reduction in performance was smaller for some combinations compared to the reduction observed for each distortion individually.
The results of the present study are consistent with previous vocoder studies with normal-hearing subjects that showed a negative effect of spectral mismatch between analysis and carrier bands on speech recognition. The present results further show that matching the frequency ranges of 1 to 2 kHz, which contain important speech information, can be more beneficial for speech recognition than matching the overall frequency ranges, in certain conditions.
使用噪声带声码器处理,探究频移和压缩-扩展对语音识别的联合急性效应。
对5名年龄在27至35岁之间的听力正常受试者进行测试,测量经噪声带声码器处理后的元音和辅音识别情况。语音信号被过滤到8个或16个分析频段,并从每个频段提取包络。载波频段由包络调制并重新合成以产生处理后的语音。在基线匹配条件下,相应分析频段和载波频段的频率范围相同。在仅频移条件下,载波频段的频率范围相对于分析频段向上或向下移动。在仅压缩和扩展条件下,分析频段范围分别比载波频段范围变大或变小。通过同时对载波频段应用频移以及对分析频段应用压缩或扩展,探究了这两种频谱失真对语音识别的联合效应。
当分别应用压缩-扩展或频移的频谱失真时,与基线匹配条件相比,性能有所下降。然而,当同时应用这两种频谱退化时,观察到了一种补偿效应;与单独观察到的每种失真导致的性能下降相比,某些组合的性能下降较小。
本研究结果与先前对听力正常受试者的声码器研究一致,该研究表明分析频段和载波频段之间的频谱失配对语音识别有负面影响。本研究结果进一步表明,在某些条件下,匹配包含重要语音信息的1至2 kHz频率范围对语音识别可能比匹配整体频率范围更有益。