基于包络功率谱域中的相关度量预测语音可懂度。

Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain.

作者信息

Relaño-Iborra Helia, May Tobias, Zaar Johannes, Scheidiger Christoph, Dau Torsten

机构信息

Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.

出版信息

J Acoust Soc Am. 2016 Oct;140(4):2670. doi: 10.1121/1.4964505.

DOI:10.1121/1.4964505

PMID:27794330

Abstract

UNLABELLED

A speech intelligibility prediction model is proposed that combines the auditory processing front end of the multi-resolution speech-based envelope power spectrum model [mr-sEPSM; Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134(1), 436-446] with a correlation back end inspired by the short-time objective intelligibility measure [STOI; Taal, Hendriks, Heusdens, and Jensen (2011). IEEE Trans. Audio Speech Lang.

PROCESS

19(7), 2125-2136]. This "hybrid" model, named sEPSM, is shown to account for the effects of stationary and fluctuating additive interferers as well as for the effects of non-linear distortions, such as spectral subtraction, phase jitter, and ideal time frequency segregation (ITFS). The model shows a broader predictive range than both the original mr-sEPSM (which fails in the phase-jitter and ITFS conditions) and STOI (which fails to predict the influence of fluctuating interferers), albeit with lower accuracy than the source models in some individual conditions. Similar to other models that employ a short-term correlation-based back end, including STOI, the proposed model fails to account for the effects of room reverberation on speech intelligibility. Overall, the model might be valuable for evaluating the effects of a large range of interferers and distortions on speech intelligibility, including consequences of hearing impairment and hearing-instrument signal processing.

摘要

未标注

提出了一种语音可懂度预测模型，该模型将基于多分辨率语音包络功率谱模型的听觉处理前端（mr-sEPSM；约根森、埃沃特和道（2013年）。《美国声学学会杂志》134(1)，436 - 446）与受短时客观可懂度度量启发的相关性后端（STOI；塔尔、亨德里克斯、赫斯登斯和詹森（2011年）。《IEEE音频、语音和语言处理汇刊》19(7)，2125 - 2136）相结合。这个名为sEPSM的“混合”模型被证明能够解释固定和波动的加性干扰源的影响以及非线性失真的影响，如谱减法、相位抖动和理想时频分离（ITFS）。该模型的预测范围比原始的mr-sEPSM（在相位抖动和ITFS条件下失效）和STOI（无法预测波动干扰源的影响）都更广泛，尽管在某些个别条件下精度低于源模型。与其他采用基于短期相关性后端的模型（包括STOI）类似，所提出的模型无法解释房间混响对语音可懂度的影响。总体而言，该模型对于评估大范围干扰源和失真对语音可懂度的影响可能具有价值，包括听力损伤和助听器械信号处理的后果。