Ali Zulfiqar, Elamvazuthi Irraivan, Alsulaiman Mansour, Muhammad Ghulam
Centre for Intelligent Signal and Imaging Research, Department of Electrical and Electronic Engineering, Universiti Teknologi PETRONAS, Tronoh 31750, Perak, Malaysia; Digital Speech Processing Group, Department of Computer Engineering, King Saud University, Riyadh 11543, Saudi Arabia.
Centre for Intelligent Signal and Imaging Research, Department of Electrical and Electronic Engineering, Universiti Teknologi PETRONAS, Tronoh 31750, Perak, Malaysia.
J Voice. 2016 Nov;30(6):757.e7-757.e19. doi: 10.1016/j.jvoice.2015.08.010. Epub 2015 Oct 27.
Automatic voice pathology detection using sustained vowels has been widely explored. Because of the stationary nature of the speech waveform, pathology detection with a sustained vowel is a comparatively easier task than that using a running speech. Some disorder detection systems with running speech have also been developed, although most of them are based on a voice activity detection (VAD), that is, itself a challenging task. Pathology detection with running speech needs more investigation, and systems with good accuracy (ACC) are required. Furthermore, pathology classification systems with running speech have not received any attention from the research community. In this article, automatic pathology detection and classification systems are developed using text-dependent running speech without adding a VAD module.
A set of three psychophysics conditions of hearing (critical band spectral estimation, equal loudness hearing curve, and the intensity loudness power law of hearing) is used to estimate the auditory spectrum. The auditory spectrum and all-pole models of the auditory spectrums are computed and analyzed and used in a Gaussian mixture model for an automatic decision.
In the experiments using the Massachusetts Eye & Ear Infirmary database, an ACC of 99.56% is obtained for pathology detection, and an ACC of 93.33% is obtained for the pathology classification system. The results of the proposed systems outperform the existing running-speech-based systems.
The developed system can effectively be used in voice pathology detection and classification systems, and the proposed features can visually differentiate between normal and pathological samples.
利用持续元音进行自动语音病理学检测已得到广泛探索。由于语音波形的平稳特性,使用持续元音进行病理学检测比使用连续语音相对更容易。虽然也开发了一些基于连续语音的障碍检测系统,但其中大多数基于语音活动检测(VAD),而这本身就是一项具有挑战性的任务。使用连续语音进行病理学检测需要更多研究,并且需要具有良好准确率(ACC)的系统。此外,基于连续语音的病理学分类系统尚未受到研究界的任何关注。在本文中,开发了使用与文本相关的连续语音且不添加VAD模块的自动病理学检测和分类系统。
使用一组三种听力心理物理学条件(临界带谱估计、等响度听力曲线和听力强度响度幂律)来估计听觉频谱。计算并分析听觉频谱及其全极点模型,并将其用于高斯混合模型以进行自动决策。
在使用马萨诸塞州眼耳医院数据库的实验中,病理学检测的准确率为99.56%,病理学分类系统的准确率为93.33%。所提出系统的结果优于现有的基于连续语音的系统。
所开发的系统可有效地用于语音病理学检测和分类系统,并且所提出的特征能够在视觉上区分正常样本和病理样本。