时间精细结构信息在语音识别中的作用一瞥。

A glimpsing account of the role of temporal fine structure information in speech recognition.

机构信息

Department of Speech and Hearing Science, The Ohio State University, Columbus, OH 43210, USA.

出版信息

Adv Exp Med Biol. 2013;787:119-26. doi: 10.1007/978-1-4614-1590-9_14.

Abstract

Many behavioral studies have reported a significant decrease in intelligibility when the temporal fine structure (TFS) of a sound mixture is replaced with noise or tones (i.e., vocoder processing). This finding has led to the conclusion that TFS information is critical for speech recognition in noise. How the normal -auditory system takes advantage of the original TFS, however, remains unclear. Three -experiments on the role of TFS in noise are described. All three experiments measured speech recognition in various backgrounds while manipulating the envelope, TFS, or both. One experiment tested the hypothesis that vocoder processing may artificially increase the apparent importance of TFS cues. Another experiment evaluated the relative contribution of the target and masker TFS by disturbing only the TFS of the target or that of the masker. Finally, a last experiment evaluated the -relative contribution of envelope and TFS information. In contrast to previous -studies, however, the original envelope and TFS were both preserved - to some extent - in all conditions. Overall, the experiments indicate a limited influence of TFS and suggest that little speech information is extracted from the TFS. Concomitantly, these experiments confirm that most speech information is carried by the temporal envelope in real-world conditions. When interpreted within the framework of the glimpsing model, the results of these experiments suggest that TFS is primarily used as a grouping cue to select the time-frequency regions -corresponding to the target speech signal.

摘要

许多行为研究报告称，当声音混合物的时间精细结构（TFS）被噪声或音调（即声码器处理）取代时，可懂度会显著降低。这一发现得出的结论是，TFS 信息对于噪声中的语音识别至关重要。然而，正常听觉系统如何利用原始 TFS 仍然不清楚。本文描述了三个关于 TFS 在噪声中作用的实验。这三个实验都在不同的背景下测量语音识别，同时操纵包络、TFS 或两者。一个实验测试了声码器处理可能会人为地增加 TFS 线索的明显重要性的假设。另一个实验通过仅干扰目标或掩蔽器的 TFS 来评估目标和掩蔽器的 TFS 的相对贡献。最后一个实验评估了包络和 TFS 信息的相对贡献。然而，与之前的研究不同，在所有条件下，原始包络和 TFS 都在一定程度上得到了保留。总的来说，这些实验表明 TFS 的影响有限，并表明从 TFS 中提取的语音信息很少。同时，这些实验证实，在现实条件下，大多数语音信息都包含在时间包络中。当根据瞥见模型的框架进行解释时，这些实验的结果表明，TFS 主要用作分组线索，以选择与目标语音信号相对应的时频区域。