Department of Linguistics, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA.
Department of Speech & Hearing Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA.
J Acoust Soc Am. 2024 Nov 1;156(5):2960-2973. doi: 10.1121/10.0034235.
Human speech perception declines in the presence of masking speech, particularly when the masker is intelligible and acoustically similar to the target. A prior investigation demonstrated a substantial reduction in masking when the intelligibility of competing speech was reduced by corrupting voiced segments with noise [Huo, Sun, Fogerty, and Tang (2023), "Quantifying informational masking due to masker intelligibility in same-talker speech-in-speech perception," in Interspeech 2023, pp. 1783-1787]. As this processing also reduced the prominence of voiced segments, it was unclear whether the unmasking was due to reduced linguistic content, acoustic similarity, or both. The current study compared the masking of original competing speech (high intelligibility) to competing speech with time reversal of voiced segments (VS-reversed, low intelligibility) at various target-to-masker ratios. Modeling results demonstrated similar energetic masking between the two maskers. However, intelligibility of the target speech was considerably better with the VS-reversed masker compared to the original masker, likely due to the reduced linguistic content. Further corrupting the masker's voiced segments resulted in additional release from masking. Acoustic analyses showed that the portion of target voiced segments overlapping with masker voiced segments and the similarity between target and masker overlapped voiced segments impacted listeners' speech recognition. Evidence also suggested modulation masking in the spectro-temporal domain interferes with listeners' ability to glimpse the target.
人类的言语感知能力会在掩蔽语音的存在下下降,特别是当掩蔽语音与目标语音可理解且声学相似时。先前的一项研究表明,通过用噪声污染浊音段来降低竞争语音的可理解度,可以显著减少掩蔽[Huo、Sun、Fogerty 和 Tang(2023),“在言语内感知中量化由于掩蔽语音可理解度引起的信息掩蔽”,Interspeech 2023,第 1783-1787 页]。由于这种处理方法也降低了浊音段的突出度,因此不清楚未掩蔽是由于语言内容减少、声学相似性还是两者兼而有之。本研究比较了原始竞争语音(高可理解度)和浊音段时间反转的竞争语音(VS 反转,低可理解度)在不同目标到掩蔽比下的掩蔽情况。建模结果表明,两种掩蔽之间存在相似的能量掩蔽。然而,与原始掩蔽相比,VS 反转掩蔽下的目标语音可理解度要好得多,这可能是由于语言内容减少。进一步破坏掩蔽语音的浊音段会导致更多的掩蔽释放。声学分析表明,目标语音的浊音段与掩蔽语音的浊音段重叠的部分以及目标和掩蔽重叠的浊音段之间的相似性会影响听众的语音识别。证据还表明,谱时域中的调制掩蔽会干扰听众瞥见目标的能力。