Department of Otolaryngology-Head and Neck Surgery, University of North Carolina at Chapel Hill.
Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE.
J Speech Lang Hear Res. 2022 Aug 17;65(8):3117-3128. doi: 10.1044/2022_JSLHR-21-00620. Epub 2022 Jul 22.
Some speech recognition data suggest that children rely less on voice pitch and harmonicity to support auditory scene analysis than adults. Two experiments evaluated development of speech-in-speech recognition using voiced speech and whispered speech, which lacks the harmonic structure of voiced speech.
Listeners were 5- to 7-year-olds and adults with normal hearing. Targets were monosyllabic words organized into three-word sets that differ in vowel content. Maskers were two-talker or one-talker streams of speech. Targets and maskers were recorded by different female talkers in both voiced and whispered speaking styles. For each masker, speech reception thresholds (SRTs) were measured in all four combinations of target and masker speech, including matched and mismatched speaking styles for the target and masker.
Children performed more poorly than adults overall. For the two-talker masker, this age effect was smaller for the whispered target and masker than for the other three conditions. Children's SRTs in this condition were predominantly positive, suggesting that they may have relied on a wholistic listening strategy rather than segregating the target from the masker. For the one-talker masker, age effects were consistent across the four conditions. Reduced informational masking for the one-talker masker could be responsible for differences in age effects for the two maskers. A benefit of mismatching the target and masker speaking style was observed for both target styles in the two-talker masker and for the voiced targets in the one-talker masker.
These results provide no compelling evidence that young school-age children and adults are differentially sensitive to the cues present in voiced and whispered speech. Both groups benefit from mismatches in speaking style under some conditions. These benefits could be due to a combination of reduced perceptual similarity, harmonic cancelation, and differences in energetic masking.
一些语音识别数据表明,儿童在听觉场景分析中对语音音高和谐波的依赖程度低于成人。本研究通过使用浊音语音和低语语音(缺乏浊音语音的谐波结构)评估语音内语音识别的发展。
被试为 5 至 7 岁儿童和听力正常的成年人。目标是组织成三词组的单音节词,元音内容不同。掩蔽音是双说话人或单说话人语音流。目标和掩蔽音由不同的女性说话人以浊音和低语两种方式录制。对于每个掩蔽音,在目标和掩蔽音的所有四种语音组合中测量语音识别阈(SRT),包括目标和掩蔽音的语音匹配和不匹配情况。
儿童的表现普遍不如成人。对于双说话人掩蔽音,对于低语目标和掩蔽音,与其他三种情况相比,这种年龄效应较小。该条件下儿童的 SRT 主要为正,这表明他们可能依赖于整体聆听策略,而不是将目标与掩蔽音分离。对于单说话人掩蔽音,在这四种情况下,年龄效应都是一致的。对于单说话人掩蔽音,信息掩蔽减少可能是两种掩蔽音的年龄效应差异的原因。对于双说话人掩蔽音中的两种目标类型以及单说话人掩蔽音中的浊音目标,匹配目标和掩蔽音的说话方式都有好处。
这些结果没有提供令人信服的证据表明年幼的学龄儿童和成人对浊音和低语语音中的提示有不同的敏感性。在某些条件下,两组人都受益于说话方式的不匹配。这些好处可能是由于感知相似性降低、谐波抵消和能量掩蔽的差异综合作用的结果。