Arias-Londoño Julián David, Godino-Llorente Juan I, Markaki Maria, Stylianou Yannis
Universidad Politécnica de Madrid, Circuits & Systems Engineering, EUIT de Telecomunicación, Universidad Politécnica de Madrid, Ctra. Valencia, km 7, Madrid 28031, Spain.
Logoped Phoniatr Vocol. 2011 Jul;36(2):60-9. doi: 10.3109/14015439.2010.528788. Epub 2010 Nov 12.
This work presents a novel approach for the automatic detection of pathological voices based on fusing the information extracted by means of mel-frequency cepstral coefficients (MFCC) and features derived from the modulation spectra (MS). The system proposed uses a two-stepped classification scheme. First, the MFCC and MS features were used to feed two different and independent classifiers; and then the outputs of each classifier were used in a second classification stage. In order to establish the best configuration which provides the highest accuracy in the detection, the fusion of information was carried out employing different classifier combination strategies. The experiments were carried out using two different databases: the one developed by The Massachusetts Eye and Ear Infirmary Voice Laboratory, and a database recorded by the Universidad Politécnica de Madrid. The results show that the combination of MFCC and MS features employing the proposed approach yields an improvement in the detection accuracy, demonstrating that both methods of parameterization are complementary.
这项工作提出了一种基于融合通过梅尔频率倒谱系数(MFCC)提取的信息和从调制谱(MS)导出的特征来自动检测病理性嗓音的新方法。所提出的系统采用两步分类方案。首先,MFCC和MS特征用于输入两个不同且独立的分类器;然后每个分类器的输出用于第二个分类阶段。为了确定在检测中提供最高准确率的最佳配置,采用不同的分类器组合策略进行信息融合。实验使用了两个不同的数据库:一个由马萨诸塞州眼耳医院嗓音实验室开发,另一个由马德里理工大学录制的数据库。结果表明,采用所提出的方法融合MFCC和MS特征可提高检测准确率,表明两种参数化方法是互补的。