基于儿童声音的年龄估计：一种基于模糊的决策融合策略。

Age estimation based on children's voice: a fuzzy-based decision fusion strategy.

作者信息

Mirhassani Seyed Mostafa, Zourmand Alireza, Ting Hua-Nong

机构信息

Biomedical Engineering Department, Faculty of Engineering, University of Malaya, Lembah Pantai, 50603 Kuala Lumpur, Malaysia.

出版信息

ScientificWorldJournal. 2014;2014:534064. doi: 10.1155/2014/534064. Epub 2014 Jun 5.

DOI:10.1155/2014/534064

PMID:25006595

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4070543/

Abstract

Automatic estimation of a speaker's age is a challenging research topic in the area of speech analysis. In this paper, a novel approach to estimate a speaker's age is presented. The method features a "divide and conquer" strategy wherein the speech data are divided into six groups based on the vowel classes. There are two reasons behind this strategy. First, reduction in the complicated distribution of the processing data improves the classifier's learning performance. Second, different vowel classes contain complementary information for age estimation. Mel-frequency cepstral coefficients are computed for each group and single layer feed-forward neural networks based on self-adaptive extreme learning machine are applied to the features to make a primary decision. Subsequently, fuzzy data fusion is employed to provide an overall decision by aggregating the classifier's outputs. The results are then compared with a number of state-of-the-art age estimation methods. Experiments conducted based on six age groups including children aged between 7 and 12 years revealed that fuzzy fusion of the classifier's outputs resulted in considerable improvement of up to 53.33% in age estimation accuracy. Moreover, the fuzzy fusion of decisions aggregated the complementary information of a speaker's age from various speech sources.

摘要

自动估计说话者的年龄是语音分析领域中一个具有挑战性的研究课题。本文提出了一种估计说话者年龄的新方法。该方法的特点是采用“分而治之”策略，即将语音数据根据元音类别分为六组。该策略背后有两个原因。第一，处理数据复杂分布的减少提高了分类器的学习性能。第二，不同的元音类别包含用于年龄估计的互补信息。为每组计算梅尔频率倒谱系数，并将基于自适应极限学习机的单层前馈神经网络应用于这些特征以做出初步决策。随后，采用模糊数据融合通过聚合分类器的输出提供总体决策。然后将结果与一些最新的年龄估计方法进行比较。基于包括7至12岁儿童在内的六个年龄组进行的实验表明，分类器输出的模糊融合使年龄估计准确率显著提高，最高可达53.33%。此外，决策的模糊融合聚合了来自各种语音源的说话者年龄的互补信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0841/4070543/75f28a543d18/TSWJ2014-534064.001.jpg

相似文献

Age estimation based on children's voice: a fuzzy-based decision fusion strategy.基于儿童声音的年龄估计：一种基于模糊的决策融合策略。

ScientificWorldJournal. 2014;2014:534064. doi: 10.1155/2014/534064. Epub 2014 Jun 5.

A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation.基于异质分类器融合与互补特征协作的两级说话人识别系统。

Sensors (Basel). 2021 Jul 28;21(15):5097. doi: 10.3390/s21155097.

The speaker's formant in male voices.男性嗓音中的言语共振峰。

J Voice. 1997 Dec;11(4):422-8. doi: 10.1016/s0892-1997(97)80038-0.

Noise and a Speaker's Impaired Voice Quality Disrupt Spoken Language Processing in School-Aged Children: Evidence From Performance and Response Time Measures.噪音和说话者受损的语音质量会干扰学龄儿童的口语处理：来自表现和反应时间测量的证据。

J Speech Lang Hear Res. 2020 Jul 20;63(7):2115-2131. doi: 10.1044/2020_JSLHR-19-00348. Epub 2020 Jun 22.

Does the speaker's voice quality influence children's performance on a language comprehension test?说话者的语音质量会影响儿童在语言理解测试中的表现吗？

Int J Speech Lang Pathol. 2015 Feb;17(1):63-73. doi: 10.3109/17549507.2014.898098. Epub 2014 Apr 13.

Consistency of voice frequency and perturbation measures in children using cepstral analyses: a movement toward increased recording stability.基于倒谱分析的儿童嗓音频率和扰力度测量的一致性：提高记录稳定性的一种方法。

JAMA Otolaryngol Head Neck Surg. 2013 Aug 1;139(8):811-6. doi: 10.1001/jamaoto.2013.3926.

Automatic Voice Pathology Detection With Running Speech by Using Estimation of Auditory Spectrum and Cepstral Coefficients Based on the All-Pole Model.基于全极点模型，通过估计听觉频谱和倒谱系数，对连续语音进行自动语音病理学检测。

J Voice. 2016 Nov;30(6):757.e7-757.e19. doi: 10.1016/j.jvoice.2015.08.010. Epub 2015 Oct 27.

Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach.基于倒谱向量的病理性嗓音检测：深度学习方法。

J Voice. 2019 Sep;33(5):634-641. doi: 10.1016/j.jvoice.2018.02.003. Epub 2018 Mar 19.

EGG open quotient in aging voices--changes with increasing chronological age and its perception.衰老嗓音中的EGG开放商——随实际年龄增长的变化及其感知

Logoped Phoniatr Vocol. 2006;31(2):51-6. doi: 10.1080/14015430500445534.

A dynamic neuro-fuzzy model providing bio-state estimation and prognosis prediction for wearable intelligent assistants.一种为可穿戴智能助手提供生物状态估计和预后预测的动态神经模糊模型。

J Neuroeng Rehabil. 2005 Jun 28;2:15. doi: 10.1186/1743-0003-2-15.

本文引用的文献

Discrimination between pathological and normal voices using GMM-SVM approach.基于 GMM-SVM 方法的病理性嗓音与正常嗓音的区分。

J Voice. 2011 Jan;25(1):38-43. doi: 10.1016/j.jvoice.2009.08.002. Epub 2010 Feb 4.

Multi-category classification using an Extreme Learning Machine for microarray gene expression cancer diagnosis.使用极限学习机进行多类别分类以诊断微阵列基因表达癌症

IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):485-495. doi: 10.1109/tcbb.2007.1012.

Speaking rate and fundamental frequency as speech cues to perceived age.语速和基频作为感知年龄的语音线索。

J Voice. 2008 Jan;22(1):58-69. doi: 10.1016/j.jvoice.2006.07.004. Epub 2006 Sep 11.

Rapid "automatized" naming (R.A.N): dyslexia differentiated from other learning disabilities.快速“自动化”命名（R.A.N）：诵读困难与其他学习障碍的区分

Neuropsychologia. 1976;14(4):471-9. doi: 10.1016/0028-3932(76)90075-0.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。