噪声语音编码对语音质量感知的影响。

The effects of noise vocoding on speech quality perception.

机构信息

University of Colorado, Speech Language, Hearing Sciences, 2501 Kittredge Loop Road, 409 UCB, Boulder, CO 80309, USA.

出版信息

Hear Res. 2014 Mar;309:75-83. doi: 10.1016/j.heares.2013.11.011. Epub 2013 Dec 11.

DOI:10.1016/j.heares.2013.11.011

Abstract

Speech perception depends on access to spectral and temporal acoustic cues. Temporal cues include slowly varying amplitude changes (i.e. temporal envelope, TE) and quickly varying amplitude changes associated with the center frequency of the auditory filter (i.e. temporal fine structure, TFS). This study quantifies the effects of TFS randomization through noise vocoding on the perception of speech quality by parametrically varying the amount of original TFS available above 1500Hz. The two research aims were: 1) to establish the role of TFS in quality perception, and 2) to determine if the role of TFS in quality perception differs between subjects with normal hearing and subjects with sensorineural hearing loss. Ratings were obtained from 20 subjects (10 with normal hearing and 10 with hearing loss) using an 11-point quality scale. Stimuli were processed in three different ways: 1) A 32-channel noise-excited vocoder with random envelope fluctuations in the noise carrier, 2) a 32-channel noise-excited vocoder with the noise-carrier envelope smoothed, and 3) removal of high-frequency bands. Stimuli were presented in quiet and in babble noise at 18dB and 12dB signal-to-noise ratios. TFS randomization had a measurable detrimental effect on quality ratings for speech in quiet and a smaller effect for speech in background babble. Subjects with normal hearing and subjects with sensorineural hearing loss provided similar quality ratings for noise-vocoded speech.

摘要

言语感知依赖于对频谱和时域声学线索的获取。时域线索包括缓慢变化的幅度变化（即时域包络，TE）和与听觉滤波器中心频率相关的快速变化的幅度变化（即时域精细结构，TFS）。本研究通过噪声声码器对 TFS 随机化对言语质量感知的影响进行量化，方法是参数化改变高于 1500Hz 的原始 TFS 的可用量。两个研究目的是：1）确定 TFS 在质量感知中的作用，2）确定 TFS 在质量感知中的作用是否因正常听力和感音神经性听力损失受试者而不同。使用 11 分制质量量表，由 20 名受试者（10 名正常听力，10 名听力损失）获得评分。刺激以三种不同方式进行处理：1）具有噪声载波中随机包络波动的 32 通道噪声激励声码器，2）噪声载波包络平滑的 32 通道噪声激励声码器，3）高频带去除。刺激在安静和背景噪声中的 babble 噪声中以 18dB 和 12dB 的信噪比呈现。TFS 随机化对安静环境下言语质量评分有可衡量的不利影响，对背景 babble 噪声中的言语影响较小。正常听力和感音神经性听力损失受试者对噪声声码化言语的质量评分相似。