School of Mechanical Engineering, Southwest Jiaotong University, Chengdu, 610031, China.
J Acoust Soc Am. 2023 Jan;153(1):423. doi: 10.1121/10.0016869.
The intelligent data-driven screening of pathological voice signals is a non-invasive and real-time tool for computer-aided diagnosis that has attracted increasing attention from researchers and clinicians. In this paper, the authors propose multi-domain features and the hierarchical extreme learning machine (H-ELM) for the automatic identification of voice disorders. A sufficient number of sensitive features are first extracted from the original voice signal through multi-domain feature extraction (i.e., features of the time domain and the sample entropy based on ensemble empirical mode decomposition and gammatone frequency cepstral coefficients). To eliminate redundancy in high-dimensional features, neighborhood component analysis is then applied to filter out sensitive features from the high-dimensional feature vectors to improve the efficiency of network training and reduce overfitting. The sensitive features thus obtained are then used to train the H-ELM for pathological voice classification. The results of the experiments showed that the sensitivity, specificity, F1 score, and accuracy of the H-ELM were 99.37%, 98.61%, 99.37%, and 98.99%, respectively. Therefore, the proposed method is feasible for the initial classification of pathological voice signals.
病理嗓音信号的智能数据驱动筛查是一种计算机辅助诊断的非侵入性实时工具,引起了研究人员和临床医生的越来越多的关注。在本文中,作者提出了多域特征和分层极限学习机(H-ELM),用于自动识别语音障碍。首先通过多域特征提取(即基于集合经验模态分解和伽马频带倒谱系数的时域特征和样本熵特征)从原始语音信号中提取足够数量的敏感特征。为了消除高维特征中的冗余,然后应用邻域成分分析从高维特征向量中滤除敏感特征,以提高网络训练的效率并减少过拟合。然后使用所获得的敏感特征来训练 H-ELM 进行病理语音分类。实验结果表明,H-ELM 的灵敏度、特异性、F1 分数和准确率分别为 99.37%、98.61%、99.37%和 98.99%。因此,该方法可用于病理语音信号的初步分类。