Aston University, Birmingham, UK.
Adv Exp Med Biol. 2013;787:323-31. doi: 10.1007/978-1-4614-1590-9_36.
How speech is separated perceptually from other speech remains poorly understood. In a series of experiments, perceptual organisation was probed by presenting three-formant (F1+F2+F3) analogues of target sentences dichotically, together with a competitor for F2 (F2C), or for F2+F3, which listeners must reject to optimise recognition. To control for energetic masking, the competitor was always presented in the opposite ear to the corresponding target formant(s). Sine-wave speech was used initially, and different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, whatever their amplitude characteristics, whereas constant-frequency F2Cs were ineffective. Subsequent studies used synthetic-formant speech to explore the effects of manipulating the rate and depth of formant-frequency change in the competitor. Competitor efficacy was not tuned to the rate of formant-frequency variation in the target sentences; rather, the reduction in intelligibility increased with competitor rate relative to the rate for the target sentences. Therefore, differences in speech rate may not be a useful cue for separating the speech of concurrent talkers. Effects of competitors whose depth of formant-frequency variation was scaled by a range of factors were explored using competitors derived either by inverting the frequency contour of F2 about its geometric mean (plausibly speech-like pattern) or by using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Competitor efficacy depended on the overall depth of frequency variation, not depth relative to that for the other formants. Furthermore, the triangle-wave competitors were as effective as their more speech-like counterparts. Overall, the results suggest that formant-frequency variation is critical for the across-frequency grouping of formants but that this grouping does not depend on speech-specific constraints.
言语在感知上是如何与其他言语区分的,这一点仍未被很好地理解。在一系列实验中,通过同时呈现三音(F1+F2+F3)目标句的听觉类似物和 F2 的竞争音(F2C)或 F2+F3 的竞争音(F2+F3C)来探测感知组织。为了控制能量掩蔽,竞争音总是在与相应目标音相对的耳朵中呈现。最初使用正弦波语音,并且通过单独操纵 F2 的幅度和频率轮廓,从 F2 中导出不同版本的 F2C。具有时变频率轮廓的 F2C 是非常有效的竞争音,无论其幅度特征如何,而恒定频率的 F2C 则无效。随后的研究使用合成音来探索在竞争音中操纵共振峰频率变化的速率和深度的效果。竞争音的有效性不是针对目标句子中的共振峰频率变化速率进行调整的;相反,相对于目标句子的速率,竞争音的速率降低会导致可懂度降低。因此,语速的差异可能不是区分同时说话者言语的有用线索。通过使用通过反转 F2 的频率轮廓(合理地类似语音的模式)或使用与反转 F2C 的平均变化速率和深度匹配的规则且任意的频率轮廓(三角形波,不太像语音)导出的竞争音,探索了竞争音深度变化范围的因子进行缩放的竞争音的效果。竞争音的有效性取决于频率变化的总体深度,而不是相对于其他共振峰的深度。此外,三角形波竞争音与更类似语音的竞争音一样有效。总体而言,结果表明,共振峰频率变化对于跨频率共振峰分组很重要,但这种分组并不依赖于语音特定的约束。