Mirhassani Seyed Mostafa, Zourmand Alireza, Ting Hua-Nong
Biomedical Engineering Department, Faculty of Engineering, University of Malaya, Lembah Pantai, 50603 Kuala Lumpur, Malaysia.
ScientificWorldJournal. 2014;2014:534064. doi: 10.1155/2014/534064. Epub 2014 Jun 5.
Automatic estimation of a speaker's age is a challenging research topic in the area of speech analysis. In this paper, a novel approach to estimate a speaker's age is presented. The method features a "divide and conquer" strategy wherein the speech data are divided into six groups based on the vowel classes. There are two reasons behind this strategy. First, reduction in the complicated distribution of the processing data improves the classifier's learning performance. Second, different vowel classes contain complementary information for age estimation. Mel-frequency cepstral coefficients are computed for each group and single layer feed-forward neural networks based on self-adaptive extreme learning machine are applied to the features to make a primary decision. Subsequently, fuzzy data fusion is employed to provide an overall decision by aggregating the classifier's outputs. The results are then compared with a number of state-of-the-art age estimation methods. Experiments conducted based on six age groups including children aged between 7 and 12 years revealed that fuzzy fusion of the classifier's outputs resulted in considerable improvement of up to 53.33% in age estimation accuracy. Moreover, the fuzzy fusion of decisions aggregated the complementary information of a speaker's age from various speech sources.
自动估计说话者的年龄是语音分析领域中一个具有挑战性的研究课题。本文提出了一种估计说话者年龄的新方法。该方法的特点是采用“分而治之”策略,即将语音数据根据元音类别分为六组。该策略背后有两个原因。第一,处理数据复杂分布的减少提高了分类器的学习性能。第二,不同的元音类别包含用于年龄估计的互补信息。为每组计算梅尔频率倒谱系数,并将基于自适应极限学习机的单层前馈神经网络应用于这些特征以做出初步决策。随后,采用模糊数据融合通过聚合分类器的输出提供总体决策。然后将结果与一些最新的年龄估计方法进行比较。基于包括7至12岁儿童在内的六个年龄组进行的实验表明,分类器输出的模糊融合使年龄估计准确率显著提高,最高可达53.33%。此外,决策的模糊融合聚合了来自各种语音源的说话者年龄的互补信息。