基于多方向回归 (MDR) 的特征用于自动语音障碍检测。

Multidirectional regression (MDR)-based features for automatic voice disorder detection.

机构信息

Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.

出版信息

J Voice. 2012 Nov;26(6):817.e19-27. doi: 10.1016/j.jvoice.2012.05.002.

DOI:10.1016/j.jvoice.2012.05.002

PMID:23177748

Abstract

BACKGROUND AND OBJECTIVE

Objective assessment of voice pathology has a growing interest nowadays. Automatic speech/speaker recognition (ASR) systems are commonly deployed in voice pathology detection. The aim of this work was to develop a novel feature extraction method for ASR that incorporates distributions of voiced and unvoiced parts, and voice onset and offset characteristics in a time-frequency domain to detect voice pathology.

MATERIALS AND METHODS

The speech samples of 70 dysphonic patients with six different types of voice disorders and 50 normal subjects were analyzed. The Arabic spoken digits (1-10) were taken as an input. The proposed feature extraction method was embedded into the ASR system with Gaussian mixture model (GMM) classifier to detect voice disorder.

RESULTS

Accuracy of 97.48% was obtained in text independent (all digits' training) case, and over 99% accuracy was obtained in text dependent (separate digit's training) case. The proposed method outperformed the conventional Mel frequency cepstral coefficient (MFCC) features.

CONCLUSION

The results of this study revealed that incorporating voice onset and offset information leads to efficient automatic voice disordered detection.

摘要

背景与目的

目前，人们对语音病理学的客观评估越来越感兴趣。自动语音/说话人识别（ASR）系统通常用于语音病理学检测。本研究旨在开发一种新的 ASR 特征提取方法，该方法结合了语音和非语音部分的分布、语音起始和结束特征在时频域中的分布，以检测语音病理学。

材料与方法

分析了 70 名患有六种不同类型语音障碍的发声障碍患者和 50 名正常受试者的语音样本。阿拉伯语数字（1-10）被用作输入。将所提出的特征提取方法嵌入到具有高斯混合模型（GMM）分类器的 ASR 系统中，以检测语音障碍。

结果

在文本独立（所有数字训练）的情况下，获得了 97.48%的准确率，在文本相关（单独数字训练）的情况下，获得了超过 99%的准确率。该方法优于传统的梅尔频率倒谱系数（MFCC）特征。

结论

本研究结果表明，结合语音起始和结束信息可以实现有效的自动语音障碍检测。

相似文献

Multidirectional regression (MDR)-based features for automatic voice disorder detection.

J Voice. 2012 Nov;26(6):817.e19-27. doi: 10.1016/j.jvoice.2012.05.002.

On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices.

Logoped Phoniatr Vocol. 2011 Jul;36(2):60-9. doi: 10.3109/14015439.2010.528788. Epub 2010 Nov 12.

Automatic Voice Pathology Detection With Running Speech by Using Estimation of Auditory Spectrum and Cepstral Coefficients Based on the All-Pole Model.

J Voice. 2016 Nov;30(6):757.e7-757.e19. doi: 10.1016/j.jvoice.2015.08.010. Epub 2015 Oct 27.

Towards objective evaluation of perceived roughness and breathiness: an approach based on mel-frequency cepstral analysis.

Logoped Phoniatr Vocol. 2011 Jul;36(2):52-9. doi: 10.3109/14015439.2010.517551. Epub 2010 Sep 17.

Automatic intelligibility assessment of speakers after laryngeal cancer by means of acoustic modeling.

J Voice. 2012 May;26(3):390-7. doi: 10.1016/j.jvoice.2011.04.010. Epub 2011 Aug 5.

Intra- and Inter-database Study for Arabic, English, and German Databases: Do Conventional Speech Features Detect Voice Pathology?

J Voice. 2017 May;31(3):386.e1-386.e8. doi: 10.1016/j.jvoice.2016.09.009. Epub 2016 Oct 10.

Discrimination between pathological and normal voices using GMM-SVM approach.

J Voice. 2011 Jan;25(1):38-43. doi: 10.1016/j.jvoice.2009.08.002. Epub 2010 Feb 4.

The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection.

J Voice. 2024 Sep;38(5):975-982. doi: 10.1016/j.jvoice.2022.03.021. Epub 2022 Apr 27.

Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels.

J Voice. 2010 Sep;24(5):540-55. doi: 10.1016/j.jvoice.2008.12.014. Epub 2009 Nov 2.

Voice source characterization using pitch synchronous discrete cosine transform for speaker identification.

J Acoust Soc Am. 2015 Jun;137(6):EL469-75. doi: 10.1121/1.4921679.

引用本文的文献

Patient State Recognition System for Healthcare Using Speech and Facial Expressions.

J Med Syst. 2016 Dec;40(12):272. doi: 10.1007/s10916-016-0627-x. Epub 2016 Oct 18.

Detection of Voice Pathology using Fractal Dimension in a Multiresolution Analysis of Normal and Disordered Speech Signals.

J Med Syst. 2016 Jan;40(1):20. doi: 10.1007/s10916-015-0392-2. Epub 2015 Nov 3.

Exploring the feasibility of smart phone microphone for measurement of acoustic voice parameters and voice pathology screening.

Eur Arch Otorhinolaryngol. 2015 Nov;272(11):3391-9. doi: 10.1007/s00405-015-3708-4. Epub 2015 Jul 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于多方向回归 (MDR) 的特征用于自动语音障碍检测。

Multidirectional regression (MDR)-based features for automatic voice disorder detection.

机构信息

出版信息

BACKGROUND AND OBJECTIVE

MATERIALS AND METHODS

RESULTS

CONCLUSION

背景与目的

材料与方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献