Wirtzfeld Michael R, Pourmand Nazanin, Parsa Vijay, Bruce Ian C
Department of Electrical and Computer Engineering, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada.
Knowles Intelligent Audio, Mountain View, California 94043, USA.
J Acoust Soc Am. 2017 Sep;142(3):EL319. doi: 10.1121/1.5003785.
Objective measures are commonly used in the development of speech coding algorithms as an adjunct to human subjective evaluation. Predictors of speech quality based on models of physiological or perceptual processing tend to perform better than measures based on simple acoustical properties. Here, a modeling method based on a detailed physiological model and a neurogram similarity measure is developed and optimized to predict the quality of an enhanced wideband speech dataset. A model capturing temporal modulations in neural activity up to 267 Hz was found to perform as well as or better than several existing objective quality measures.
客观测量在语音编码算法的开发中通常作为人类主观评估的辅助手段被广泛使用。基于生理或感知处理模型的语音质量预测指标往往比基于简单声学特性的指标表现更好。在此,开发并优化了一种基于详细生理模型和神经图相似性度量的建模方法,以预测增强型宽带语音数据集的质量。结果发现,一个能够捕捉高达267Hz神经活动时间调制的模型,其性能与几种现有的客观质量度量相当或更好。