MFCC 帧数对自动语音病理学检测的影响。

The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection.

机构信息

Department of Signal Processing and Acoustics, Aalto University, Finland.

出版信息

J Voice. 2024 Sep;38(5):975-982. doi: 10.1016/j.jvoice.2022.03.021. Epub 2022 Apr 27.

DOI:10.1016/j.jvoice.2022.03.021

Abstract

Automatic voice pathology detection is a research topic, which has gained increasing interest recently. Although methods based on deep learning are becoming popular, the classical pipeline systems based on a two-stage architecture consisting of a feature extraction stage and a classifier stage are still widely used. In these classical detection systems, frame-wise computation of mel-frequency cepstral coefficients (MFCCs) is the most popular feature extraction method. However, no systematic study has been conducted to investigate the effect of the MFCC frame length on automatic voice pathology detection. In this work, we studied the effect of the MFCC frame length in voice pathology detection using three disorders (hyperkinetic dysphonia, hypokinetic dysphonia and reflux laryngitis) from the Saarbrücken Voice Disorders (SVD) database. The detection performance was compared between speaker-dependent and speaker-independent scenarios as well as between speaking task -dependent and speaking task -independent scenarios. The Support Vector Machine, which is the most widely used classifier in the study area, was used as the classifier. The results show that the detection accuracy depended on the MFFC frame length in all the scenarios studied. The best detection accuracy was obtained by using a MFFC frame length of 500 ms with a shift of 5 ms.

摘要

自动语音病理学检测是一个研究课题，最近越来越受到关注。虽然基于深度学习的方法变得越来越流行，但基于两级架构（特征提取阶段和分类器阶段）的经典流水线系统仍然被广泛使用。在这些经典的检测系统中，逐帧计算梅尔频率倒谱系数（MFCC）是最流行的特征提取方法。然而，还没有系统的研究来研究 MFCC 帧长度对自动语音病理学检测的影响。在这项工作中，我们使用来自 Saarbrücken 语音障碍（SVD）数据库的三种障碍（多动性发音障碍、运动性发音障碍和反流性喉炎）研究了 MFCC 帧长度对语音障碍检测的影响。在说话者相关和说话者无关的场景以及说话任务相关和说话任务无关的场景之间比较了检测性能。支持向量机（Support Vector Machine）是该研究领域最广泛使用的分类器，被用作分类器。结果表明，在所有研究的场景中，检测准确性都取决于 MFCC 帧长度。使用 500ms 的 MFCC 帧长度和 5ms 的偏移量可以获得最佳的检测准确性。

相似文献

The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection.

J Voice. 2024 Sep;38(5):975-982. doi: 10.1016/j.jvoice.2022.03.021. Epub 2022 Apr 27.

Multidirectional regression (MDR)-based features for automatic voice disorder detection.

J Voice. 2012 Nov;26(6):817.e19-27. doi: 10.1016/j.jvoice.2012.05.002.

Intra- and Inter-database Study for Arabic, English, and German Databases: Do Conventional Speech Features Detect Voice Pathology?

J Voice. 2017 May;31(3):386.e1-386.e8. doi: 10.1016/j.jvoice.2016.09.009. Epub 2016 Oct 10.

Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach.

J Voice. 2019 Sep;33(5):634-641. doi: 10.1016/j.jvoice.2018.02.003. Epub 2018 Mar 19.

Automatic Voice Pathology Detection With Running Speech by Using Estimation of Auditory Spectrum and Cepstral Coefficients Based on the All-Pole Model.

J Voice. 2016 Nov;30(6):757.e7-757.e19. doi: 10.1016/j.jvoice.2015.08.010. Epub 2015 Oct 27.

On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices.

Logoped Phoniatr Vocol. 2011 Jul;36(2):60-9. doi: 10.3109/14015439.2010.528788. Epub 2010 Nov 12.

Investigation of Voice Pathology Detection and Classification on Different Frequency Regions Using Correlation Functions.

J Voice. 2017 Jan;31(1):3-15. doi: 10.1016/j.jvoice.2016.01.014. Epub 2016 Mar 15.

An Investigation of Multidimensional Voice Program Parameters in Three Different Databases for Voice Pathology Detection and Classification.

J Voice. 2017 Jan;31(1):113.e9-113.e18. doi: 10.1016/j.jvoice.2016.03.019. Epub 2016 Apr 19.

Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework.

Int J Lang Commun Disord. 2023 Mar;58(2):279-294. doi: 10.1111/1460-6984.12783. Epub 2022 Sep 18.

Towards objective evaluation of perceived roughness and breathiness: an approach based on mel-frequency cepstral analysis.

Logoped Phoniatr Vocol. 2011 Jul;36(2):52-9. doi: 10.3109/14015439.2010.517551. Epub 2010 Sep 17.

引用本文的文献

Optimizing MFCC Parameters for Breathing Phase Detection.

Sensors (Basel). 2025 Aug 13;25(16):5002. doi: 10.3390/s25165002.

A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection.

Sci Rep. 2023 Dec 20;13(1):22719. doi: 10.1038/s41598-023-49869-6.

End-to-end deep learning classification of vocal pathology using stacked vowels.

Laryngoscope Investig Otolaryngol. 2023 Aug 31;8(5):1312-1318. doi: 10.1002/lio2.1144. eCollection 2023 Oct.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

MFCC 帧数对自动语音病理学检测的影响。

The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献