文献检索，用中文搜 PubMed

BACKGROUND

Speech sound disorders (SSDs) are common communication challenges in children, typically assessed by speech-language pathologists (SLPs) using standardized tools. However, traditional evaluation methods are time-intensive and prone to variability, raising concerns about reliability.

OBJECTIVE

This study aimed to compare the evaluation outcomes of SLPs and an automatic speech recognition (ASR) model using two standardized SSD assessments in South Korea, evaluating the ASR model's performance.

METHODS

A fine-tuned wav2vec 2.0 XLS-R model, pretrained on 436,000 hours of adult voice data spanning 128 languages, was used. The model was further trained on 93.6 minutes of children's voices with articulation errors to improve error detection. Participants included children referred to the Department of Rehabilitation Medicine at a general hospital in Incheon, South Korea, from August 19, 2022, to June 14, 2023. Two standardized assessments-the Assessment of Phonology and Articulation for Children (APAC) and the Urimal Test of Articulation and Phonology (U-TAP)-were used, with ASR transcriptions compared to SLP transcriptions.

RESULTS

This study included 30 children aged 3-7 years who were suspected of having SSDs. The phoneme error rates for the APAC and U-TAP were 8.42% (457/5430) and 8.91% (402/4514), respectively, indicating discrepancies between the ASR model and SLP transcriptions across all phonemes. Consonant error rates were 10.58% (327/3090) and 11.86% (331/2790) for the APAC and U-TAP, respectively. On average, there were 2.60 (SD 1.54) and 3.07 (SD 1.39) discrepancies per child for correctly produced phonemes, and 7.87 (SD 3.66) and 7.57 (SD 4.85) discrepancies per child for incorrectly produced phonemes, based on the APAC and U-TAP, respectively. The correlation between SLPs and the ASR model in terms of the percentage of consonants correct was excellent, with an intraclass correlation coefficient of 0.984 (95% CI 0.953-0.994) and 0.978 (95% CI 0.941-0.990) for the APAC and UTAP, respectively. The z scores between SLPs and ASR showed more pronounced differences with the APAC than the U-TAP, with 8 individuals showing discrepancies in the APAC compared to 2 in the U-TAP.

CONCLUSIONS

The results demonstrate the potential of the ASR model in assessing children with SSDs. However, its performance varied based on phoneme or word characteristics, highlighting areas for refinement. Future research should include more diverse speech samples, clinical settings, and speech data to strengthen the model's refinement and ensure broader clinical applicability.

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

语音障碍（SSDs）是儿童常见的沟通障碍，通常由言语语言病理学家（SLP）使用标准化工具进行评估。然而，传统的评估方法耗时且容易出现变异性，引发了对可靠性的担忧。

目的

本研究旨在比较韩国言语语言病理学家（SLP）和自动语音识别（ASR）模型使用两种标准化SSD评估的评估结果，评估ASR模型的性能。

方法

使用在436,000小时跨越128种语言的成人语音数据上预训练的微调wav2vec 2.0 XLS-R模型。该模型在93.6分钟有发音错误的儿童语音上进一步训练，以提高错误检测能力。参与者包括2022年8月19日至2023年6月14日转诊至韩国仁川一家综合医院康复医学科的儿童。使用了两种标准化评估——儿童语音和发音评估（APAC）和发音与语音的尿样测试（U-TAP），将ASR转录与SLP转录进行比较。

结果

本研究纳入了30名3至7岁疑似患有语音障碍的儿童。APAC和U-TAP的音素错误率分别为8.42%（457/5430）和8.91%（402/4514），表明ASR模型和SLP转录在所有音素上存在差异。APAC和U-TAP的辅音错误率分别为10.58%（327/3090）和11.86%（331/2790）。基于APAC和U-TAP，每个正确发音的音素平均每个儿童有2.60（标准差1.54）和3.07（标准差1.39）个差异，每个错误发音的音素平均每个儿童有7.87（标准差3.66）和7.57（标准差4.85）个差异。在正确辅音百分比方面SLP与ASR模型之间的相关性非常好，APAC和U-TAP的组内相关系数分别为0.984（95%置信区间0.953 - 0.994）和0.978（95%置信区间0.941 - 0.990）。SLP和ASR之间的z分数在APAC中比在U-TAP中显示出更明显的差异，APAC中有8人存在差异，而U-TAP中有2人存在差异。

结论

结果证明了ASR模型在评估患有语音障碍儿童方面的潜力。然而，其性能因音素或单词特征而异，突出了需要改进的领域。未来的研究应包括更多样化的语音样本、临床环境和语音数据，以加强模型的改进并确保更广泛的临床适用性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

语音障碍儿童自动语音识别评估的效用：验证研究

Usefulness of Automatic Speech Recognition Assessment of Children With Speech Sound Disorders: Validation Study.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

相似文献

本文引用的文献

语音障碍儿童自动语音识别评估的效用：验证研究

Usefulness of Automatic Speech Recognition Assessment of Children With Speech Sound Disorders: Validation Study.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

本文引用的文献