原发性进行性运动性构音障碍的自动语音识别。

Automatic Speech Recognition in Primary Progressive Apraxia of Speech.

机构信息

Department of Neurology, Mayo Clinic, Rochester, MN.

Department of Radiology, Mayo Clinic, Rochester, MN.

出版信息

J Speech Lang Hear Res. 2024 Sep 12;67(9):2964-2976. doi: 10.1044/2024_JSLHR-24-00049. Epub 2024 Aug 6.

DOI:10.1044/2024_JSLHR-24-00049

PMID:39265154

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11427443/

Abstract

INTRODUCTION

Transcribing disordered speech can be useful when diagnosing motor speech disorders such as primary progressive apraxia of speech (PPAOS), who have sound additions, deletions, and substitutions, or distortions and/or slow, segmented speech. Since transcribing speech can be a laborious process and requires an experienced listener, using automatic speech recognition (ASR) systems for diagnosis and treatment monitoring is appealing. This study evaluated the efficacy of a readily available ASR system (wav2vec 2.0) in transcribing speech of PPAOS patients to determine if the word error rate (WER) output by the ASR can differentiate between healthy speech and PPAOS and/or among its subtypes, whether WER correlates with AOS severity, and how the ASR's errors compare to those noted in manual transcriptions.

METHOD

Forty-five patients with PPAOS and 22 healthy controls were recorded repeating 13 words, 3 times each, which were transcribed manually and using wav2vec 2.0. The WER and phonetic and prosodic speech errors were compared between groups, and ASR results were compared against manual transcriptions.

RESULTS

Mean overall WER was 0.88 for patients and 0.33 for controls. WER significantly correlated with AOS severity and accurately distinguished between patients and controls but not between AOS subtypes. The phonetic and prosodic errors from the ASR transcriptions were also unable to distinguish between subtypes, whereas errors calculated from human transcriptions were. There was poor agreement in the number of phonetic and prosodic errors between the ASR and human transcriptions.

CONCLUSIONS

This study demonstrates that ASR can be useful in differentiating healthy from disordered speech and evaluating PPAOS severity but does not distinguish PPAOS subtypes. ASR transcriptions showed weak agreement with human transcriptions; thus, ASR may be a useful tool for the transcription of speech in PPAOS, but the research questions posed must be carefully considered within the context of its limitations.

SUPPLEMENTAL MATERIAL

https://doi.org/10.23641/asha.26359417.

摘要

简介

当诊断运动性言语障碍（如原发性进行性构音障碍，PPAOS）时，转写言语障碍可能非常有用，因为 PPAOS 患者的言语可能会出现声音添加、删除、替换，或扭曲和/或语速缓慢、分段的现象。由于转写言语可能是一个繁琐的过程，并且需要有经验的听众，因此使用自动语音识别（ASR）系统进行诊断和治疗监测是很有吸引力的。本研究评估了一种现成的 ASR 系统（wav2vec 2.0）在转写 PPAOS 患者言语中的功效，以确定 ASR 的单词错误率（WER）输出是否可以区分健康言语和 PPAOS 以及/或其亚型，WER 是否与 AOS 严重程度相关，以及 ASR 的错误与手动转写中的错误有何不同。

方法

记录了 45 名 PPAOS 患者和 22 名健康对照者重复 13 个单词 3 次的语音，这些语音由人工和 wav2vec 2.0 进行转写。比较了组间的 WER 和语音及韵律言语错误，并将 ASR 结果与手动转写进行了比较。

结果

患者的平均总 WER 为 0.88，对照组为 0.33。WER 与 AOS 严重程度显著相关，能够准确区分患者和对照组，但不能区分 AOS 亚型。ASR 转写的语音和韵律错误也无法区分亚型，而人工转写的错误可以。ASR 与人工转写的语音和韵律错误数量之间的一致性较差。

结论

本研究表明，ASR 可用于区分健康和障碍性言语，评估 PPAOS 严重程度，但不能区分 PPAOS 亚型。ASR 转写与人工转写的一致性较弱；因此，ASR 可能是 PPAOS 语音转录的有用工具，但必须在其局限性的背景下仔细考虑提出的研究问题。

补充材料

https://doi.org/10.23641/asha.26359417.

相似文献

Automatic Speech Recognition in Primary Progressive Apraxia of Speech.原发性进行性运动性构音障碍的自动语音识别。

J Speech Lang Hear Res. 2024 Sep 12;67(9):2964-2976. doi: 10.1044/2024_JSLHR-24-00049. Epub 2024 Aug 6.

Characterizing Speech Errors Across Primary Progressive Apraxia of Speech Subtypes.原发性进行性言语失用症各亚型的言语错误特征分析

J Speech Lang Hear Res. 2024 Mar 11;67(3):811-820. doi: 10.1044/2023_JSLHR-23-00577. Epub 2024 Feb 20.

Prosodic and phonetic subtypes of primary progressive apraxia of speech.原发性进行性言语失用症的韵律和语音亚型

Brain Lang. 2018 Sep;184:54-65. doi: 10.1016/j.bandl.2018.06.004. Epub 2018 Jul 4.

A Longitudinal Evaluation of Speech Rate in Primary Progressive Apraxia of Speech.原发性进行性言语失用症中语速的纵向评估。

J Speech Lang Hear Res. 2021 Feb 17;64(2):392-404. doi: 10.1044/2020_JSLHR-20-00253. Epub 2021 Jan 21.

The agreement of phonetic transcriptions between paediatric speech and language therapists transcribing a disordered speech sample.儿科言语和语言治疗师转写语音样本的音标转录的一致性。

Int J Lang Commun Disord. 2024 Sep-Oct;59(5):1981-1995. doi: 10.1111/1460-6984.13043. Epub 2024 Jun 8.

Clinical Progression in Four Cases of Primary Progressive Apraxia of Speech.四种原发性进行性运动性言语失用症的临床进展。

Am J Speech Lang Pathol. 2018 Nov 21;27(4):1303-1318. doi: 10.1044/2018_AJSLP-17-0227.

Syndromes dominated by apraxia of speech show distinct characteristics from agrammatic PPA.以言语失用为主症的综合征与语法性 PPA 具有明显不同的特征。

Neurology. 2013 Jul 23;81(4):337-45. doi: 10.1212/WNL.0b013e31829c5ed5. Epub 2013 Jun 26.

Progression of Motor Speech Function in Speakers With Primary Progressive Apraxia of Speech.原发性进行性言语失用症患者运动言语功能的进展

J Speech Lang Hear Res. 2024 Dec 9;67(12):4651-4662. doi: 10.1044/2024_JSLHR-24-00283. Epub 2024 Nov 15.

A Preliminary Look Into the Clinical Evolution of Motor Speech Characteristics in Primary Progressive Apraxia of Speech in Québec French.初探魁北克法语原发性进行性运动性言语失用的言语运动特征的临床演变。

Am J Speech Lang Pathol. 2021 Jun 18;30(3S):1459-1476. doi: 10.1044/2020_AJSLP-20-00162. Epub 2021 Mar 9.

Temporal acoustic measures distinguish primary progressive apraxia of speech from primary progressive aphasia.颞部声学测量可将原发性进行性言语失用症与原发性进行性失语症区分开来。

Brain Lang. 2017 May;168:84-94. doi: 10.1016/j.bandl.2017.01.012. Epub 2017 Feb 7.

引用本文的文献

Automated Assessment of Word- and Sentence-Level Speech Intelligibility in Developmental Motor Speech Disorders: A Cross-Linguistic Investigation.发育性运动言语障碍中单词和句子层面言语可懂度的自动评估：一项跨语言研究。

Diagnostics (Basel). 2025 Jul 28;15(15):1892. doi: 10.3390/diagnostics15151892.

本文引用的文献

Characterizing Speech Errors Across Primary Progressive Apraxia of Speech Subtypes.原发性进行性言语失用症各亚型的言语错误特征分析

J Speech Lang Hear Res. 2024 Mar 11;67(3):811-820. doi: 10.1044/2023_JSLHR-23-00577. Epub 2024 Feb 20.

Longitudinal characterization of patients with progressive apraxia of speech without clearly predominant phonetic or prosodic speech features.无明显优势音系或韵律语音特征的进行性言语失用症患者的纵向特征。

Brain Lang. 2023 Oct;245:105314. doi: 10.1016/j.bandl.2023.105314. Epub 2023 Aug 20.

The Apraxia of Speech Rating Scale: Reliability, Validity, and Utility.言语失用症评定量表：信度、效度和实用性。

Am J Speech Lang Pathol. 2023 Mar 9;32(2):469-491. doi: 10.1044/2022_AJSLP-22-00148. Epub 2023 Jan 11.

Validity of Off-the-Shelf Automatic Speech Recognition for Assessing Speech Intelligibility and Speech Severity in Speakers With Amyotrophic Lateral Sclerosis.用于评估肌萎缩侧索硬化症患者言语可懂度和言语严重度的现成自动语音识别的有效性。

J Speech Lang Hear Res. 2022 Jun 8;65(6):2128-2143. doi: 10.1044/2022_JSLHR-21-00589. Epub 2022 May 27.

A molecular pathology, neurobiology, biochemical, genetic and neuroimaging study of progressive apraxia of speech.进行性言语失用症的分子病理学、神经生物学、生物化学、遗传学和神经影像学研究。

Nat Commun. 2021 Jun 8;12(1):3452. doi: 10.1038/s41467-021-23687-8.

Communication Limitations in Patients With Progressive Apraxia of Speech and Aphasia.进展性运动性失语症伴构音障碍患者的交流障碍。

Am J Speech Lang Pathol. 2020 Nov 12;29(4):1976-1986. doi: 10.1044/2020_AJSLP-20-00012. Epub 2020 Aug 5.

Automated Speech Recognition in Adult Stroke Survivors: Comparing Human and Computer Transcriptions.成人中风幸存者的自动语音识别：人工转录与计算机转录的比较

Folia Phoniatr Logop. 2019;71(5-6):286-296. doi: 10.1159/000499156. Epub 2019 May 22.

Prosodic and phonetic subtypes of primary progressive apraxia of speech.原发性进行性言语失用症的韵律和语音亚型

Brain Lang. 2018 Sep;184:54-65. doi: 10.1016/j.bandl.2018.06.004. Epub 2018 Jul 4.

The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance.构音障碍性言语中的感知障碍与自动语音识别性能之间的关系。

J Acoust Soc Am. 2016 Nov;140(5):EL416. doi: 10.1121/1.4967208.

Primary progressive apraxia of speech: clinical features and acoustic and neurologic correlates.原发性进行性言语失用症：临床特征及声学和神经学关联

Am J Speech Lang Pathol. 2015 May;24(2):88-100. doi: 10.1044/2015_AJSLP-14-0174.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验