使用倒谱特征的费舍尔向量表示法检测神经性嗓音障碍

Detection of Neurogenic Voice Disorders Using the Fisher Vector Representation of Cepstral Features.

作者信息

Yagnavajjula Madhu Keerthana, Alku Paavo, Rao Krothapalli Sreenivasa, Mitra Pabitra

机构信息

Advanced Technology Development Centre, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India; Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland.

Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland.

出版信息

J Voice. 2025 May;39(3):757-763. doi: 10.1016/j.jvoice.2022.10.016. Epub 2022 Nov 21.

DOI:10.1016/j.jvoice.2022.10.016

PMID:36424242

Abstract

Neurogenic voice disorders (NVDs) are caused by damage or malfunction of the central or peripheral nervous system that controls vocal fold movement. In this paper, we investigate the potential of the Fisher vector (FV) encoding in automatic detection of people with NVDs. FVs are used to convert features from frame level (local descriptors) to utterance level (global descriptors). At the frame level, we extract two popular cepstral representations, namely, Mel-frequency cepstral coefficients (MFCCs) and perceptual linear prediction cepstral coefficients (PLPCCs), from acoustic voice signals. In addition, the MFCC features are also extracted from every frame of the glottal source signal computed using a glottal inverse filtering (GIF) technique. The global descriptors derived from the local descriptors are used to train a support vector machine (SVM) classifier. Experiments are conducted using voice signals from 80 healthy speakers and 80 patients with NVDs (40 with spasmodic dysphonia (SD) and 40 with recurrent laryngeal nerve palsy (RLNP)) taken from the Saarbruecken voice disorder (SVD) database. The overall results indicate that the use of the FV encoding leads to better identification of people with NVDs, compared to the defacto temporal encoding. Furthermore, the SVM trained using the combination of FVs derived from the cepstral and glottal features provides the overall best detection performance.

摘要

神经性嗓音障碍（NVDs）是由控制声带运动的中枢或周围神经系统受损或功能失调引起的。在本文中，我们研究了费舍尔向量（FV）编码在自动检测神经性嗓音障碍患者方面的潜力。FV用于将特征从帧级别（局部描述符）转换为话语级别（全局描述符）。在帧级别，我们从声学语音信号中提取两种流行的倒谱表示，即梅尔频率倒谱系数（MFCCs）和感知线性预测倒谱系数（PLPCCs）。此外，还从使用声门逆滤波（GIF）技术计算出的声门源信号的每一帧中提取MFCC特征。从局部描述符派生的全局描述符用于训练支持向量机（SVM）分类器。使用来自萨尔布吕肯嗓音障碍（SVD）数据库的80名健康受试者和80名神经性嗓音障碍患者（40名痉挛性发音障碍（SD）患者和40名喉返神经麻痹（RLNP）患者）的语音信号进行实验。总体结果表明，与实际的时间编码相比，使用FV编码能更好地识别神经性嗓音障碍患者。此外，使用从倒谱特征和声门特征派生的FV组合训练的SVM提供了总体最佳检测性能。

相似文献

Detection of Neurogenic Voice Disorders Using the Fisher Vector Representation of Cepstral Features.

J Voice. 2025 May;39(3):757-763. doi: 10.1016/j.jvoice.2022.10.016. Epub 2022 Nov 21.

The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection.

J Voice. 2024 Sep;38(5):975-982. doi: 10.1016/j.jvoice.2022.03.021. Epub 2022 Apr 27.

Analysis and Classification of Voice Pathologies Using Glottal Signal Parameters.

J Voice. 2016 Sep;30(5):549-56. doi: 10.1016/j.jvoice.2015.06.010. Epub 2015 Oct 23.

Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach.

J Voice. 2019 Sep;33(5):634-641. doi: 10.1016/j.jvoice.2018.02.003. Epub 2018 Mar 19.

Cepstral analysis of voice in unilateral adductor vocal fold palsy.

J Voice. 2011 May;25(3):326-9. doi: 10.1016/j.jvoice.2009.12.010. Epub 2010 Mar 25.

Hierarchical Classification and System Combination for Automatically Identifying Physiological and Neuromuscular Laryngeal Pathologies.

J Voice. 2017 May;31(3):384.e9-384.e14. doi: 10.1016/j.jvoice.2016.09.003. Epub 2016 Oct 12.

Acoustic analysis of four common voice diagnoses: moving toward disorder-specific assessment.

J Voice. 2014 Sep;28(5):582-8. doi: 10.1016/j.jvoice.2014.02.002. Epub 2014 May 28.

Predicting Voice Disorder Status From Smoothed Measures of Cepstral Peak Prominence Using Praat and Analysis of Dysphonia in Speech and Voice (ADSV).

J Voice. 2017 Sep;31(5):557-566. doi: 10.1016/j.jvoice.2017.01.006. Epub 2017 Feb 4.

Voice disorder discrimination using vowel acoustic measures in female speakers.

Int J Lang Commun Disord. 2024 Sep-Oct;59(5):2087-2102. doi: 10.1111/1460-6984.13081. Epub 2024 Jun 17.

Analysis of vocal fold function from acoustic data simultaneously recorded with high-speed endoscopy.

J Voice. 2012 Nov;26(6):726-33. doi: 10.1016/j.jvoice.2012.02.001. Epub 2012 May 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用倒谱特征的费舍尔向量表示法检测神经性嗓音障碍

Detection of Neurogenic Voice Disorders Using the Fisher Vector Representation of Cepstral Features.

作者信息

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献