Shen Yi, Pearson Dylan V
Department of Speech and Hearing Sciences, Indiana University Bloomington, Bloomington, Indiana 47405, USA.
J Acoust Soc Am. 2017 Mar;141(3):1835. doi: 10.1121/1.4978060.
Modulation masking is known to impact speech intelligibility, but it is not clear whether the mechanism underlying this phenomenon is an invariant, bottom-up process, or if it is subjected to factors such as perceptual segregation and stimulus uncertainty thereby showing a top-down component. In the main experiment of the current study (Exp. II), listeners' ability to recognize sequences of synthesized vowels (i.e., the target) in sinusoidally amplitude-modulated noises (i.e., the masker) was evaluated. The target and masker were designed to be perceptually distinct to limit the top-down component of modulation masking. The duration of each vowel was either 25 or 100 ms, the rate at which the vowels were presented was either 1 or 6 Hz, and the masker modulation rate was varied between 0.5 and 16 Hz. The selective performance degradation when the target and masker modulation spectra overlap, as would be expected from modulation masking, was not observed. In addition, these results were able to be adequately captured using a model of energetic masking without any modulation processing stages and fitted only using the vowel-recognition performance in steady-state maskers, as obtained from Exp. I. Results suggest that speech modulation masking might not be mediated through an early-sensory mechanism.
调制掩蔽已知会影响言语可懂度,但尚不清楚这种现象背后的机制是一个不变的、自下而上的过程,还是受到诸如知觉分离和刺激不确定性等因素的影响,从而表现出一个自上而下的成分。在本研究的主要实验(实验二)中,评估了听众在正弦幅度调制噪声(即掩蔽声)中识别合成元音序列(即目标音)的能力。目标音和掩蔽声在知觉上被设计为不同,以限制调制掩蔽的自上而下成分。每个元音的时长为25或100毫秒,呈现元音的速率为1或6赫兹,掩蔽声的调制速率在0.5至16赫兹之间变化。未观察到当目标音和掩蔽声的调制频谱重叠时出现如调制掩蔽所预期的选择性性能下降。此外,使用一个没有任何调制处理阶段的能量掩蔽模型,仅根据实验一获得的稳态掩蔽声中的元音识别性能进行拟合,就能够充分捕捉这些结果。结果表明,言语调制掩蔽可能不是通过早期感觉机制介导的。