句子层面病理性语音的自动可懂度分类

Automatic intelligibility classification of sentence-level pathological speech.

作者信息

Kim Jangwon, Kumar Naveen, Tsiartas Andreas, Li Ming, Narayanan Shrikanth S

机构信息

Signal Analysis and Interpretation Laboratory (SAIL) , University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA.

Signal Analysis and Interpretation Laboratory (SAIL) , University of Southern California, 3710 McClintock Ave., Los Angeles, CA 90089, USA ; Department of Electrical Engineering, Computer Science, Linguistics and Psychology, University of Southern California (USC), 3620 McClintock Ave., Los Angeles, CA 90089, USA.

出版信息

Comput Speech Lang. 2015 Jan;29(1):132-144. doi: 10.1016/j.csl.2014.02.001.

DOI:10.1016/j.csl.2014.02.001

PMID:25414544

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4233325/

Abstract

Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. Although automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist experts in diagnosis and treatment design, the many sources and types of variability often make it a very challenging computational processing problem. In this work we propose novel sentence-level features to capture abnormal variation in the prosodic, voice quality and pronunciation aspects in pathological speech. In addition, we propose a post-classification posterior smoothing scheme which refines the posterior of a test sample based on the posteriors of other test samples. Finally, we perform feature-level fusions and subsystem decision fusion for arriving at a final intelligibility decision. The performances are tested on two pathological speech datasets, the NKI CCRT Speech Corpus (advanced head and neck cancer) and the TORGO database (cerebral palsy or amyotrophic lateral sclerosis), by evaluating classification accuracy without overlapping subjects' data among training and test partitions. Results show that the feature sets of each of the voice quality subsystem, prosodic subsystem, and pronunciation subsystem, offer significant discriminating power for binary intelligibility classification. We observe that the proposed posterior smoothing in the acoustic space can further reduce classification errors. The smoothed posterior score fusion of subsystems shows the best classification performance (73.5% for unweighted, and 72.8% for weighted, average recalls of the binary classes).

摘要

病理性语音通常是指由于疾病、伤病或对发声系统的其他物理或生物损伤，导致声音和/或发音机制出现异常而造成的语音失真状况。尽管在这些情况下，自动评估语音清晰度和质量有助于专家进行诊断和设计治疗方案，但众多的变异性来源和类型往往使其成为一个极具挑战性的计算处理问题。在这项工作中，我们提出了新颖的句子级特征，以捕捉病理性语音在韵律、语音质量和发音方面的异常变化。此外，我们提出了一种分类后验平滑方案，该方案基于其他测试样本的后验来细化测试样本的后验。最后，我们进行特征级融合和子系统决策融合，以得出最终的清晰度决策。通过在训练和测试分区中不重叠受试者数据的情况下评估分类准确率，在两个病理性语音数据集上测试了性能，这两个数据集分别是NKI CCRT语音语料库（晚期头颈癌）和TORGO数据库（脑瘫或肌萎缩侧索硬化症）。结果表明，语音质量子系统、韵律子系统和发音子系统的每个特征集，对于二元清晰度分类都具有显著的区分能力。我们观察到，在声学空间中提出的后验平滑可以进一步减少分类错误。子系统的平滑后验分数融合显示出最佳的分类性能（二元类别的未加权平均召回率为73.5%，加权平均召回率为72.8%）。

相似文献

Automatic intelligibility classification of sentence-level pathological speech.

Comput Speech Lang. 2015 Jan;29(1):132-144. doi: 10.1016/j.csl.2014.02.001.

Intelligibility Evaluation of Pathological Speech through Multigranularity Feature Extraction and Optimization.

Comput Math Methods Med. 2017;2017:2431573. doi: 10.1155/2017/2431573. Epub 2017 Jan 17.

Dysarthria in Mandarin-Speaking Children With Cerebral Palsy: Speech Subsystem Profiles.

J Speech Lang Hear Res. 2018 Mar 15;61(3):525-548. doi: 10.1044/2017_JSLHR-S-17-0065.

I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance.

PLoS One. 2016 May 13;11(5):e0154486. doi: 10.1371/journal.pone.0154486. eCollection 2016.

Speech intelligibility estimation using multi-resolution spectral features for speakers undergoing cancer treatment.

J Acoust Soc Am. 2014 Oct;136(4):EL315-21. doi: 10.1121/1.4896410.

Toward phonetic intelligibility testing in dysarthria.

J Speech Hear Disord. 1989 Nov;54(4):482-99. doi: 10.1044/jshd.5404.482.

An Evaluation of Output Signal to Noise Ratio as a Predictor of Cochlear Implant Speech Intelligibility.

Ear Hear. 2018 Sep/Oct;39(5):958-968. doi: 10.1097/AUD.0000000000000556.

Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples.

Logoped Phoniatr Vocol. 2016 Oct;41(3):106-16. doi: 10.3109/14015439.2015.1019563. Epub 2015 May 27.

Intelligibility and the acoustic characteristics of speech in amyotrophic lateral sclerosis (ALS).

J Speech Hear Res. 1994 Jun;37(3):496-503. doi: 10.1044/jshr.3703.496.

Acoustic-phonetic contrasts and intelligibility in the dysarthria associated with mixed cerebral palsy.

J Speech Hear Res. 1992 Apr;35(2):296-308. doi: 10.1044/jshr.3502.296.

引用本文的文献

Automatic speech analysis combined with machine learning reliably predicts the motor state in people with Parkinson's disease.

NPJ Parkinsons Dis. 2025 May 2;11(1):105. doi: 10.1038/s41531-025-00959-4.

An automatic measure for speech intelligibility in dysarthrias-validation across multiple languages and neurological disorders.

Front Digit Health. 2024 Jul 23;6:1440986. doi: 10.3389/fdgth.2024.1440986. eCollection 2024.

The Relationship between Posture and Muscle Tensive Dysphonia in Teachers: A Systematic Scoping Review.

J Funct Morphol Kinesiol. 2024 Mar 28;9(2):60. doi: 10.3390/jfmk9020060.

Quantitative Speech Assessment in Ataxia-Consensus Recommendations by the Ataxia Global Initiative Working Group on Digital-Motor Markers.

Cerebellum. 2024 Jun;23(3):1128-1134. doi: 10.1007/s12311-023-01623-4. Epub 2023 Oct 28.

Intelligent speech technologies for transcription, disease diagnosis, and medical equipment interactive control in smart hospitals: A review.

Comput Biol Med. 2023 Feb;153:106517. doi: 10.1016/j.compbiomed.2022.106517. Epub 2023 Jan 5.

Intelligibility Evaluation of Pathological Speech through Multigranularity Feature Extraction and Optimization.

Comput Math Methods Med. 2017;2017:2431573. doi: 10.1155/2017/2431573. Epub 2017 Jan 17.

本文引用的文献

Frequency of consonant articulation errors in dysarthric speech.

Clin Linguist Phon. 2010 Oct;24(10):759-70. doi: 10.3109/02699206.2010.497238.

Voice and speech outcomes of chemoradiation for advanced head and neck cancer: a systematic review.

Eur Arch Otorhinolaryngol. 2010 Oct;267(10):1495-505. doi: 10.1007/s00405-010-1316-x. Epub 2010 Jun 30.

Pretreatment organ function in patients with advanced head and neck cancer: clinical outcome measures and patients' views.

BMC Ear Nose Throat Disord. 2009 Nov 15;9:10. doi: 10.1186/1472-6815-9-10.

Speech technology-based assessment of phoneme intelligibility in dysarthria.

Int J Lang Commun Disord. 2009 Sep-Oct;44(5):716-30. doi: 10.1080/13682820802342062.

Pathological voice assessment.

Conf Proc IEEE Eng Med Biol Soc. 2006;2006:1669-73. doi: 10.1109/IEMBS.2006.259835.

Electroglottographic comparison of voice outcomes in patients with advanced laryngopharyngeal cancer treated by chemoradiotherapy or total laryngectomy.

Int J Radiat Oncol Biol Phys. 2008 Feb 1;70(2):344-52. doi: 10.1016/j.ijrobp.2007.06.040. Epub 2007 Sep 19.

Quantifying the relation between speech quality and speech intelligibility.

J Speech Hear Res. 1995 Jun;38(3):714-25. doi: 10.1044/jshr.3803.714.

Speech considerations in oral surgery. Part II. Speech characteristics of patients following surgery for oral malignancies.

Oral Surg Oral Med Oral Pathol. 1978 Sep;46(3):354-61. doi: 10.1016/0030-4220(78)90399-7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

句子层面病理性语音的自动可懂度分类

Automatic intelligibility classification of sentence-level pathological speech.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献