基于非线性耳蜗模型输出的时间分析对噪声语音进行动态共振峰跟踪

Dynamic formant tracking of noisy speech using temporal analysis on outputs from a nonlinear cochlear model.

作者信息

Deng L, Kheirallah I

机构信息

Department of Electrical and Computer Engineering, University of Waterloo, Ont., Canada.

出版信息

IEEE Trans Biomed Eng. 1993 May;40(5):456-67. doi: 10.1109/10.243416.

DOI:10.1109/10.243416

PMID:8225334

Abstract

In this paper we take a modeling approach to studying representation of formant frequencies of spoken speech and speech in noise in the temporal responses of the peripheral auditory system. On the basis of the properties of the representation, we have devised and evaluated a cross-channel correlation algorithm and an interpeak interval analysis for automatic formant extraction of speech which is strongly dynamic in acoustic characteristics and is embedded in noise. The basilar membrane model used in this study contains laterally coupled damping elements, which are made monotonically dependent on the spatial distribution of the short-term power in the outputs of the model. Efficient digital implementation and the related salient numerical properties of the model are described. Simulation results from the model in response to speech and speech in noise illustrate temporal response patterns that are tonotopically organized in relation to speech formant parameters with little influence by the noise level. By utilizing such relations the devised cross-channel correlation algorithm is shown to be capable of accurately tracking formant movements in spoken syllables and sentences.

摘要

在本文中，我们采用一种建模方法来研究外周听觉系统的时间响应中，口语语音和噪声中语音的共振峰频率表征。基于该表征的特性，我们设计并评估了一种跨通道相关算法和一种峰间间隔分析方法，用于自动提取声学特征强烈动态且嵌入噪声中的语音的共振峰。本研究中使用的基底膜模型包含横向耦合的阻尼元件，这些元件单调依赖于模型输出中短期功率的空间分布。描述了该模型的高效数字实现及相关显著数值特性。该模型对语音和噪声中语音的响应的模拟结果表明，时间响应模式在与语音共振峰参数相关的频率组织上，受噪声水平的影响很小。通过利用这种关系，所设计的跨通道相关算法能够准确跟踪口语音节和句子中的共振峰移动。