Department of Electrical Engineering and Brain Science Research Center, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong Yuseong-gu, Daejeon 305-701, Republic of Korea.
Neural Netw. 2013 Sep;45:62-9. doi: 10.1016/j.neunet.2013.02.006. Epub 2013 Mar 7.
A nonlinear speech feature extraction algorithm was developed by modeling human cochlear functions, and demonstrated as a noise-robust front-end for speech recognition systems. The algorithm was based on a model of the Organ of Corti in the human cochlea with such features as such as basilar membrane (BM), outer hair cells (OHCs), and inner hair cells (IHCs). Frequency-dependent nonlinear compression and amplification of OHCs were modeled by lateral inhibition to enhance spectral contrasts. In particular, the compression coefficients had frequency dependency based on the psychoacoustic evidence. Spectral subtraction and temporal adaptation were applied in the time-frame domain. With long-term and short-term adaptation characteristics, these factors remove stationary or slowly varying components and amplify the temporal changes such as onset or offset. The proposed features were evaluated with a noisy speech database and showed better performance than the baseline methods such as mel-frequency cepstral coefficients (MFCCs) and RASTA-PLP in unknown noisy conditions.
一种基于人类耳蜗功能建模的非线性语音特征提取算法被开发出来,并被证明是一种抗噪的语音识别系统前端。该算法基于人耳蜗中的 Corti 器官模型,具有基底膜 (BM)、外毛细胞 (OHCs) 和内毛细胞 (IHCs) 等特征。通过侧向抑制来增强频谱对比度,对 OHCs 的频率相关非线性压缩和放大进行建模。特别是,根据心理声学证据,压缩系数具有频率依赖性。在时间域中应用频谱相减和时间自适应。通过长期和短期自适应特性,这些因素去除静止或缓慢变化的分量,并放大诸如起始或结束的时间变化。在所提出的特征中,通过噪声语音数据库进行了评估,并在未知噪声条件下显示出比基线方法(如梅尔频率倒谱系数 (MFCC) 和 RASTA-PLP)更好的性能。