Human Science Course, Graduate School of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka, 815-8540, Japan.
Department of Human Science, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka, 815-8540, Japan.
Sci Rep. 2022 Feb 22;12(1):3002. doi: 10.1038/s41598-022-06925-x.
The present investigation focused on how temporal degradation affected intelligibility in two types of languages, i.e., a tonal language (Mandarin Chinese) and a non-tonal language (Japanese). The temporal resolution of common daily-life sentences spoken by native speakers was systematically degraded with mosaicking (mosaicising), in which the power of original speech in each of regularly spaced time-frequency unit was averaged and temporal fine structure was removed. The results showed very similar patterns of variations in intelligibility for these two languages over a wide range of temporal resolution, implying that temporal degradation crucially affected speech cues other than tonal cues in degraded speech without temporal fine structure. Specifically, the intelligibility of both languages maintained a ceiling up to about the 40-ms segment duration, then the performance gradually declined with increasing segment duration, and reached a floor at about the 150-ms segment duration or longer. The same limitations for the ceiling performance up to 40 ms appeared for the other method of degradation, i.e., local time-reversal, implying that a common temporal processing mechanism was related to the limitations. The general tendency fitted to a dual time-window model of speech processing, in which a short (~ 20-30 ms) and a long (~ 200 ms) time-window run in parallel.
本研究关注的是时频降解如何影响两种语言的可懂度,即声调语言(汉语普通话)和非声调语言(日语)。通过镶嵌(mosaicising)对母语者所说的常见日常语句的时间分辨率进行系统降解,在镶嵌中,每个规则时间-频率单元中的原始语音的功率被平均,并且去除了时间精细结构。结果表明,在广泛的时间分辨率范围内,这两种语言的可懂度变化模式非常相似,这意味着在没有时间精细结构的时频降解语音中,时间降解严重影响了声调线索以外的语音线索。具体来说,这两种语言的可懂度都保持在约 40 毫秒片段时长的上限,然后随着片段时长的增加,性能逐渐下降,在约 150 毫秒或更长的片段时长时达到下限。另一种降解方法,即局部时间反转,也表现出了相同的上限性能限制在 40 毫秒,这表明存在一个共同的时间处理机制与这些限制有关。总体趋势符合语音处理的双时间窗模型,其中一个短(约 20-30 毫秒)和一个长(约 200 毫秒)时间窗并行运行。