基于时域精细结构或包络线索的低通和高通滤波对语音可懂度的影响。

Effects of lowpass and highpass filtering on the intelligibility of speech based on temporal fine structure or envelope cues.

机构信息

Laboratoire de Psychologie de la Perception, CNRS, Universite Paris Descartes, DEC, Ecole Normale Supérieure, 29 rue d'Ulm, 75005 Paris, France.

出版信息

Hear Res. 2010 Feb;260(1-2):89-95. doi: 10.1016/j.heares.2009.12.002. Epub 2009 Dec 4.

DOI:10.1016/j.heares.2009.12.002

PMID:19963053

Abstract

This study aimed to assess whether or not temporal envelope (E) and fine structure (TFS) cues in speech convey distinct phonetic information. Syllables uttered by a male and female speaker were (i) processed to retain either E or TFS within 16 frequency bands, (ii) lowpass or highpass filtered at different cut-off frequencies, and (iii) presented for identification to seven listeners. Psychometric functions were fitted using a sigmoid function, and used to determine crossover frequencies (cut-off frequencies at which lowpass and highpass filtering yielded equivalent performance), and gradients at each point of the psychometric functions (change in performance with respect to cut-off frequency). Crossover frequencies and gradients were not significantly different across speakers. Crossover frequencies were not significantly different between E and TFS speech ( approximately 1.5kHz). Gradients were significantly different between E and TFS speech in various filtering conditions. When stimuli were highpass filtered above 2.5kHz, performance was significantly above chance level and gradients were significantly different from 0 for E speech only. These findings suggest that E and TFS convey important but distinct phonetic cues between 1 and 2kHz. Unlike TFS, E conveys information up to 6kHz, consistent with the characteristics of neural phase locking to E and TFS.

摘要

本研究旨在评估语音中的时域包络 (E) 和精细结构 (TFS) 线索是否传递不同的语音信息。由男性和女性说话者发出的音节（i）经过处理，保留 16 个频带内的 E 或 TFS，（ii）在不同截止频率下进行低通或高通滤波，（iii）呈现给 7 位听众进行识别。使用 sigmoid 函数拟合心理物理函数，并用于确定交叉频率（在低通和高通滤波产生等效性能的截止频率）和心理物理函数的每个点的梯度（性能随截止频率的变化）。说话者之间的交叉频率和梯度没有显著差异。E 和 TFS 语音之间的交叉频率没有显著差异（约 1.5kHz）。在各种滤波条件下，E 和 TFS 语音之间的梯度存在显著差异。当刺激在 2.5kHz 以上进行高通滤波时，性能显著高于机会水平，并且仅对于 E 语音，梯度与 0 显著不同。这些发现表明，E 和 TFS 在 1 到 2kHz 之间传递重要但不同的语音线索。与 TFS 不同，E 传递的信息高达 6kHz，与 E 和 TFS 的神经相位锁定特征一致。