口语障碍者会话语音的自动语音识别。

Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech.

机构信息

Google LLC, Mountain View, CA.

MND Association, Northampton, United Kingdom.

出版信息

J Speech Lang Hear Res. 2024 Nov 7;67(11):4176-4185. doi: 10.1044/2024_JSLHR-24-00045. Epub 2024 Jul 4.

DOI:10.1044/2024_JSLHR-24-00045

PMID:38963790

Abstract

PURPOSE

This study examines the effectiveness of automatic speech recognition (ASR) for individuals with speech disorders, addressing the gap in performance between read and conversational ASR. We analyze the factors influencing this disparity and the effect of speech mode-specific training on ASR accuracy.

METHOD

Recordings of read and conversational speech from 27 individuals with various speech disorders were analyzed using both (a) one speaker-independent ASR system trained and optimized for typical speech and (b) multiple ASR models that were personalized to the speech of the participants with disordered speech. Word error rates were calculated for each speech model, read versus conversational, and subject. Linear mixed-effects models were used to assess the impact of speech mode and disorder severity on ASR accuracy. We investigated nine variables, classified as technical, linguistic, or speech impairment factors, for their potential influence on the performance gap.

RESULTS

We found a significant performance gap between read and conversational speech in both personalized and unadapted ASR models. Speech impairment severity notably impacted recognition accuracy in unadapted models for both speech modes and in personalized models for read speech. Linguistic attributes of utterances were the most influential on accuracy, though atypical speech characteristics also played a role. Including conversational speech samples in model training notably improved recognition accuracy.

CONCLUSIONS

We observed a significant performance gap in ASR accuracy between read and conversational speech for individuals with speech disorders. This gap was largely due to the linguistic complexity and unique characteristics of speech disorders in conversational speech. Training personalized ASR models using conversational speech significantly improved recognition accuracy, demonstrating the importance of domain-specific training and highlighting the need for further research into ASR systems capable of handling disordered conversational speech effectively.

摘要

目的

本研究考察了自动语音识别（ASR）在言语障碍个体中的有效性，解决了读和会话 ASR 之间性能差距的问题。我们分析了影响这种差异的因素以及针对特定言语模式的训练对 ASR 准确性的影响。

方法

使用（a）一个针对典型言语进行训练和优化的单说话人独立 ASR 系统和（b）针对言语障碍者言语进行个性化的多个 ASR 模型，对 27 名具有各种言语障碍的个体的读和会话言语记录进行了分析。为每个言语模型、读和会话以及个体计算了单词错误率。线性混合效应模型用于评估言语模式和障碍严重程度对 ASR 准确性的影响。我们研究了九种变量，分为技术、语言和言语障碍因素，以评估它们对性能差距的潜在影响。

结果

我们发现，在个性化和未适应的 ASR 模型中，读和会话言语之间都存在显著的性能差距。言语障碍严重程度显著影响了未适应模型中两种言语模式的识别准确性，以及个性化模型中读言语的识别准确性。话语的语言属性对准确性的影响最大，但不典型的言语特征也起了作用。在模型训练中包含会话言语样本显著提高了识别准确性。

结论

我们观察到，言语障碍个体的 ASR 准确性在读和会话言语之间存在显著的性能差距。这种差距主要归因于会话言语中语言复杂性和言语障碍的独特特征。使用会话言语训练个性化 ASR 模型显著提高了识别准确性，这证明了特定领域训练的重要性，并强调了需要进一步研究能够有效处理障碍性会话言语的 ASR 系统。

相似文献

Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech.

J Speech Lang Hear Res. 2024 Nov 7;67(11):4176-4185. doi: 10.1044/2024_JSLHR-24-00045. Epub 2024 Jul 4.

Prescription of Controlled Substances: Benefits and Risks

The agreement of phonetic transcriptions between paediatric speech and language therapists transcribing a disordered speech sample.

Int J Lang Commun Disord. 2024 Sep-Oct;59(5):1981-1995. doi: 10.1111/1460-6984.13043. Epub 2024 Jun 8.

How People Living With Amyotrophic Lateral Sclerosis Use Personalized Automatic Speech Recognition Technology to Support Communication.

J Speech Lang Hear Res. 2024 Nov 7;67(11):4186-4202. doi: 10.1044/2024_JSLHR-24-00097. Epub 2024 Jul 11.

Decoding disparities: evaluating automatic speech recognition system performance in transcribing Black and White patient verbal communication with nurses in home healthcare.

JAMIA Open. 2024 Dec 10;7(4):ooae130. doi: 10.1093/jamiaopen/ooae130. eCollection 2024 Dec.

Auditory-Perceptual Evaluation of Situationally-Bound Judgements of Listener Comfort for Postlaryngectomy Voice and Speech.

Int J Lang Commun Disord. 2025 Sep-Oct;60(5):e70114. doi: 10.1111/1460-6984.70114.

Treatment for speech disorder in Friedreich ataxia and other hereditary ataxia syndromes.

Cochrane Database Syst Rev. 2014 Oct 28;2014(10):CD008953. doi: 10.1002/14651858.CD008953.pub2.

Automated Assessment of Word- and Sentence-Level Speech Intelligibility in Developmental Motor Speech Disorders: A Cross-Linguistic Investigation.

Diagnostics (Basel). 2025 Jul 28;15(15):1892. doi: 10.3390/diagnostics15151892.

Multichannel speech enhancement for automatic speech recognition: a literature review.

PeerJ Comput Sci. 2025 Mar 27;11:e2772. doi: 10.7717/peerj-cs.2772. eCollection 2025.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

引用本文的文献

Automated Assessment of Word- and Sentence-Level Speech Intelligibility in Developmental Motor Speech Disorders: A Cross-Linguistic Investigation.

Diagnostics (Basel). 2025 Jul 28;15(15):1892. doi: 10.3390/diagnostics15151892.

Exploring the Perceptions of Voice-Assisted Technology as a Tool for Speech and Voice Difficulties: Focus Group Study Among People With Parkinson Disease and Their Carers.

JMIR Rehabil Assist Technol. 2025 Jul 16;12:e75316. doi: 10.2196/75316.

Artificial Intelligence in Communication Sciences and Disorders: Introduction to the Forum.

J Speech Lang Hear Res. 2024 Nov 7;67(11):4157-4161. doi: 10.1044/2024_JSLHR-24-00594. Epub 2024 Oct 17.

本文引用的文献

Effects of a Concurrent Working Memory Task on Speech Acoustics in Parkinson's Disease.

Am J Speech Lang Pathol. 2024 Jan 3;33(1):418-434. doi: 10.1044/2023_AJSLP-23-00214. Epub 2023 Dec 11.

Attentional Demand of Motor Speech Encoding: Evidence From Parkinson's Disease.

J Speech Lang Hear Res. 2022 Oct 17;65(10):3758-3775. doi: 10.1044/2022_JSLHR-22-00096. Epub 2022 Oct 6.

Validity of Off-the-Shelf Automatic Speech Recognition for Assessing Speech Intelligibility and Speech Severity in Speakers With Amyotrophic Lateral Sclerosis.

J Speech Lang Hear Res. 2022 Jun 8;65(6):2128-2143. doi: 10.1044/2022_JSLHR-21-00589. Epub 2022 May 27.

Shorter Sentence Length Maximizes Intelligibility and Speech Motor Performance in Persons With Dysarthria Due to Amyotrophic Lateral Sclerosis.

Am J Speech Lang Pathol. 2019 Feb 21;28(1):96-107. doi: 10.1044/2018_AJSLP-18-0049.

A Phonetic Complexity-Based Approach for Intelligibility and Articulatory Precision Testing: A Preliminary Study on Talkers With Amyotrophic Lateral Sclerosis.

J Speech Lang Hear Res. 2018 Sep 19;61(9):2205-2214. doi: 10.1044/2018_JSLHR-S-17-0462.

Vowel reduction across tasks for male speakers of American English.

J Acoust Soc Am. 2016 Jul;140(1):369. doi: 10.1121/1.4955310.

A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research.

J Chiropr Med. 2016 Jun;15(2):155-63. doi: 10.1016/j.jcm.2016.02.012. Epub 2016 Mar 31.

Between-speaker and within-speaker variation in speech tempo of American English.

J Acoust Soc Am. 2010 Aug;128(2):839-50. doi: 10.1121/1.3459842.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

口语障碍者会话语音的自动语音识别。

Automatic Speech Recognition of Conversational Speech in Individuals With Disordered Speech.

机构信息

出版信息

PURPOSE

METHOD

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献