在存在扩展高频线索的情况下，语音内语音识别的声道重要性。

Band importance for speech-in-speech recognition in the presence of extended high-frequency cues.

机构信息

Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA.

Department of Otolaryngology/HNS, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.

出版信息

J Acoust Soc Am. 2024 Aug 1;156(2):1202-1213. doi: 10.1121/10.0028269.

DOI:10.1121/10.0028269

PMID:39158325

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11335358/

Abstract

Band importance functions for speech-in-noise recognition, typically determined in the presence of steady background noise, indicate a negligible role for extended high frequencies (EHFs; 8-20 kHz). However, recent findings indicate that EHF cues support speech recognition in multi-talker environments, particularly when the masker has reduced EHF levels relative to the target. This scenario can occur in natural auditory scenes when the target talker is facing the listener, but the maskers are not. In this study, we measured the importance of five bands from 40 to 20 000 Hz for speech-in-speech recognition by notch-filtering the bands individually. Stimuli consisted of a female target talker recorded from 0° and a spatially co-located two-talker female masker recorded either from 0° or 56.25°, simulating a masker either facing the listener or facing away, respectively. Results indicated peak band importance in the 0.4-1.3 kHz band and a negligible effect of removing the EHF band in the facing-masker condition. However, in the non-facing condition, the peak was broader and EHF importance was higher and comparable to that of the 3.3-8.3 kHz band in the facing-masker condition. These findings suggest that EHFs contain important cues for speech recognition in listening conditions with mismatched talker head orientations.

摘要

用于语音噪声识别的频带重要性函数，通常在稳态背景噪声存在的情况下确定，表明扩展高频（EHF；8-20 kHz）的作用可以忽略不计。然而，最近的研究结果表明，EHF 线索在多说话人环境中支持语音识别，特别是当掩蔽器的 EHF 水平相对于目标降低时。当目标说话人面对听众，但掩蔽器不面对听众时，这种情况可能会出现在自然听觉场景中。在这项研究中，我们通过单独对五个从 40 到 20000 Hz 的频带进行带阻滤波，测量了这些频带在语音内语音识别中的重要性。刺激由来自 0°的女性目标说话人录制，以及来自 0°或 56.25°的空间共定位的两个女性掩蔽器录制，分别模拟掩蔽器面向听众或背向听众。结果表明，在面对掩蔽器的情况下，峰值频带重要性在 0.4-1.3 kHz 频带，去除 EHF 频带的影响可以忽略不计。然而，在非面对情况下，峰值更宽，EHF 的重要性更高，与面对掩蔽器情况下的 3.3-8.3 kHz 频带相当。这些发现表明，在说话人头部方向不匹配的聆听条件下，EHF 包含了语音识别的重要线索。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在存在扩展高频线索的情况下，语音内语音识别的声道重要性。

Band importance for speech-in-speech recognition in the presence of extended high-frequency cues.

机构信息

出版信息

相似文献

本文引用的文献

在存在扩展高频线索的情况下，语音内语音识别的声道重要性。

Band importance for speech-in-speech recognition in the presence of extended high-frequency cues.

机构信息

出版信息

相似文献

本文引用的文献