Yagnavajjula Madhu Keerthana, Alku Paavo, Rao Krothapalli Sreenivasa, Mitra Pabitra
Advanced Technology Development Centre, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India; Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland.
Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland.
J Voice. 2025 May;39(3):757-763. doi: 10.1016/j.jvoice.2022.10.016. Epub 2022 Nov 21.
Neurogenic voice disorders (NVDs) are caused by damage or malfunction of the central or peripheral nervous system that controls vocal fold movement. In this paper, we investigate the potential of the Fisher vector (FV) encoding in automatic detection of people with NVDs. FVs are used to convert features from frame level (local descriptors) to utterance level (global descriptors). At the frame level, we extract two popular cepstral representations, namely, Mel-frequency cepstral coefficients (MFCCs) and perceptual linear prediction cepstral coefficients (PLPCCs), from acoustic voice signals. In addition, the MFCC features are also extracted from every frame of the glottal source signal computed using a glottal inverse filtering (GIF) technique. The global descriptors derived from the local descriptors are used to train a support vector machine (SVM) classifier. Experiments are conducted using voice signals from 80 healthy speakers and 80 patients with NVDs (40 with spasmodic dysphonia (SD) and 40 with recurrent laryngeal nerve palsy (RLNP)) taken from the Saarbruecken voice disorder (SVD) database. The overall results indicate that the use of the FV encoding leads to better identification of people with NVDs, compared to the defacto temporal encoding. Furthermore, the SVM trained using the combination of FVs derived from the cepstral and glottal features provides the overall best detection performance.
神经性嗓音障碍(NVDs)是由控制声带运动的中枢或周围神经系统受损或功能失调引起的。在本文中,我们研究了费舍尔向量(FV)编码在自动检测神经性嗓音障碍患者方面的潜力。FV用于将特征从帧级别(局部描述符)转换为话语级别(全局描述符)。在帧级别,我们从声学语音信号中提取两种流行的倒谱表示,即梅尔频率倒谱系数(MFCCs)和感知线性预测倒谱系数(PLPCCs)。此外,还从使用声门逆滤波(GIF)技术计算出的声门源信号的每一帧中提取MFCC特征。从局部描述符派生的全局描述符用于训练支持向量机(SVM)分类器。使用来自萨尔布吕肯嗓音障碍(SVD)数据库的80名健康受试者和80名神经性嗓音障碍患者(40名痉挛性发音障碍(SD)患者和40名喉返神经麻痹(RLNP)患者)的语音信号进行实验。总体结果表明,与实际的时间编码相比,使用FV编码能更好地识别神经性嗓音障碍患者。此外,使用从倒谱特征和声门特征派生的FV组合训练的SVM提供了总体最佳检测性能。