使用混合 SFX 时间序列预处理和集成特征选择对人类声音进行分类。

Classifying human voices by using hybrid SFX time-series preprocessing and ensemble feature selection.

机构信息

Department of Computer and Information Science, University of Macau, Macau.

出版信息

Biomed Res Int. 2013;2013:720834. doi: 10.1155/2013/720834. Epub 2013 Oct 29.

DOI:10.1155/2013/720834

PMID:24288684

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3830839/

Abstract

Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC.

摘要

声纹识别是一种生理特征，每个人的声音都不同。由于这种独特性，声纹分类在分类说话者的性别、母语或种族（口音）、情绪状态、身份验证、语音命令控制等方面找到了有用的应用。在本文中，我们采用了一种名为统计特征提取（SFX）的新预处理方法，用于从音频波形作为时间序列的分段变换中提取分类模型训练中的重要特征。使用 SFX，我们可以忠实地重塑时间序列的统计特征；结合频谱分析，可以提取大量的特征。集成方法用于选择仅用于分类模型归纳的有影响的特征。我们专注于比较各种流行的数据挖掘算法在多个数据集上的效果。我们的实验包括对人类语音数据的四个典型类别（女性和男性、情感语音、说话人识别和语言识别）进行分类测试。实验结果令人鼓舞，支持从时间和频率域中启发式地选择重要特征确实比传统的信号处理技术（如小波和 LPC 到 CC）在语音分类方面产生更好的性能这一事实。