Sound and Image Processing Laboratory, School of Electrical Engineering, KTH-Royal Institute of Technology, Osquldas vag 10, SE-100 44 Stockholm, Sweden.
J Acoust Soc Am. 2010 Feb;127(2):EL73-9. doi: 10.1121/1.3284545.
It is shown that robust dimension-reduction of a feature set for speech recognition can be based on a model of the human auditory system. Whereas conventional methods optimize classification performance, the proposed method exploits knowledge implicit in the auditory periphery, inheriting its robustness. Features are selected to maximize the similarity of the Euclidean geometry of the feature domain and the perceptual domain. Recognition experiments using mel-frequency cepstral coefficients (MFCCs) confirm the effectiveness of the approach, which does not require labeled training data. For noisy data the method outperforms commonly used discriminant-analysis based dimension-reduction methods that rely on labeling. The results indicate that selecting MFCCs in their natural order results in subsets with good performance.
研究表明,基于人类听觉系统模型,可以实现语音识别特征集的稳健降维。与传统方法优化分类性能不同,所提出的方法利用了听觉外围的隐含知识,继承了其稳健性。特征被选择为最大程度地提高特征域和感知域的欧几里得几何相似性。使用梅尔频率倒谱系数 (MFCCs) 的识别实验证实了该方法的有效性,该方法不需要标记训练数据。对于噪声数据,该方法优于通常使用的基于标记的判别分析降维方法。结果表明,按照自然顺序选择 MFCCs 可以得到性能良好的子集。