Brungart Douglas S, Iyer Nandini, Simpson Brian D
Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433-7901, USA.
J Acoust Soc Am. 2006 Apr;119(4):2327-33. doi: 10.1121/1.2170030.
When listening to natural speech, listeners are fairly adept at using cues such as pitch, vocal tract length, prosody, and level differences to extract a target speech signal from an interfering speech masker. However, little is known about the cues that listeners might use to segregate synthetic speech signals that retain the intelligibility characteristics of speech but lack many of the features that listeners normally use to segregate competing talkers. In this experiment, intelligibility was measured in a diotic listening task that required the segregation of two simultaneously presented synthetic sentences. Three types of synthetic signals were created: (1) sine-wave speech (SWS); (2) modulated noise-band speech (MNB); and (3) modulated sine-band speech (MSB). The listeners performed worse for all three types of synthetic signals than they did with natural speech signals, particularly at low signal-to-noise ratio (SNR) values. Of the three synthetic signals, the results indicate that SWS signals preserve more of the voice characteristics used for speech segregation than MNB and MSB signals. These findings have implications for cochlear implant users, who rely on signals very similar to MNB speech and thus are likely to have difficulty understanding speech in cocktail-party listening environments.
在聆听自然语音时,听众相当擅长利用诸如音高、声道长度、韵律和电平差异等线索,从干扰性语音掩蔽中提取目标语音信号。然而,对于听众可能用于分离合成语音信号的线索却知之甚少,这些合成语音信号保留了语音的可懂度特征,但缺乏许多听众通常用于分离竞争说话者的特征。在本实验中,通过双耳聆听任务测量可懂度,该任务要求分离两个同时呈现的合成句子。创建了三种类型的合成信号:(1) 正弦波语音 (SWS);(2) 调制噪声带语音 (MNB);以及 (3) 调制正弦带语音 (MSB)。与自然语音信号相比,听众对所有三种类型的合成信号的表现都更差,尤其是在低信噪比 (SNR) 值时。在这三种合成信号中,结果表明,与MNB和MSB信号相比,SWS信号保留了更多用于语音分离的语音特征。这些发现对人工耳蜗使用者具有启示意义,他们依赖与MNB语音非常相似的信号,因此在鸡尾酒会聆听环境中理解语音可能会有困难。