• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于深度学习的可调谐强制对齐系统:在儿童语音中的应用。

A Tunable Forced Alignment System Based on Deep Learning: Applications to Child Speech.

作者信息

Kadambi Prad, Mahr Tristan J, Hustad Katherine C, Berisha Visar

机构信息

School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe.

College of Health Solutions, Arizona State University, Tempe.

出版信息

J Speech Lang Hear Res. 2025 Jul 29;68(7S):3583-3601. doi: 10.1044/2024_JSLHR-24-00347. Epub 2025 Mar 31.

DOI:10.1044/2024_JSLHR-24-00347
PMID:40163771
Abstract

PURPOSE

Phonetic forced alignment has a multitude of applications in automated analysis of speech, particularly in studying nonstandard speech such as children's speech. Manual alignment is tedious but serves as the gold standard for clinical-grade alignment. Current tools do not support direct training on manual alignments. Thus, a trainable speaker adaptive phonetic forced alignment system, Wav2TextGrid, was developed for children's speech. The source code for the method is publicly available along with a graphical user interface at https://github.com/pkadambi/Wav2TextGrid.

METHOD

We propose a trainable, speaker-adaptive, neural forced aligner developed using a corpus of 42 neurotypical children from 3 to 6 years of age. Evaluation on both child speech and on the TIMIT corpus was performed to demonstrate aligner performance across age and dialectal variations.

RESULTS

The trainable alignment tool markedly improved accuracy over baseline for several alignment quality metrics, for all phoneme categories. Accuracy for plosives and affricates in children's speech improved more than 40% over baseline. Performance matched existing methods using approximately 13 min of labeled data, while approximately 45-60 min of labeled alignments yielded significant improvement.

CONCLUSION

The Wav2TextGrid tool allows alternate alignment workflows where the forced alignments, via training, are directly tailored to match clinical-grade, manually provided alignments.

SUPPLEMENTAL MATERIAL

https://doi.org/10.23641/asha.28593971.

摘要

目的

语音强制对齐在语音自动分析中有多种应用,特别是在研究非标准语音(如儿童语音)方面。手动对齐很繁琐,但却是临床级对齐的金标准。当前工具不支持直接基于手动对齐进行训练。因此,我们开发了一种可训练的说话人自适应语音强制对齐系统Wav2TextGrid,用于儿童语音。该方法的源代码以及图形用户界面可在https://github.com/pkadambi/Wav2TextGrid上公开获取。

方法

我们提出了一种可训练的、说话人自适应的神经强制对齐器,它是使用42名3至6岁发育正常儿童的语料库开发的。我们对儿童语音和TIMIT语料库进行了评估,以展示对齐器在不同年龄和方言变体中的性能。

结果

对于所有音素类别,该可训练对齐工具在几个对齐质量指标上比基线显著提高了准确性。儿童语音中爆破音和塞擦音的准确率比基线提高了40%以上。使用大约13分钟的标记数据时,性能与现有方法相当,而使用大约45 - 60分钟的标记对齐则有显著提升。

结论

Wav2TextGrid工具允许采用替代对齐工作流程,通过训练,强制对齐可以直接定制以匹配临床级的手动提供的对齐。

补充材料

https://doi.org/10.23641/asha.28593971。

相似文献

1
A Tunable Forced Alignment System Based on Deep Learning: Applications to Child Speech.一种基于深度学习的可调谐强制对齐系统:在儿童语音中的应用。
J Speech Lang Hear Res. 2025 Jul 29;68(7S):3583-3601. doi: 10.1044/2024_JSLHR-24-00347. Epub 2025 Mar 31.
2
Factors affecting judgment accuracy when scoring children's responses to non-word repetition stimuli in real time.实时评分儿童对非词重复刺激反应时影响判断准确性的因素。
Int J Lang Commun Disord. 2024 Mar-Apr;59(2):678-697. doi: 10.1111/1460-6984.12954. Epub 2023 Oct 9.
3
Phonological Awareness Skills in Thai-Speaking Children: A Scoping Review.说泰语儿童的语音意识技能:一项范围综述
Int J Lang Commun Disord. 2025 Sep-Oct;60(5):e70099. doi: 10.1111/1460-6984.70099.
4
A systematic review of speech, language and communication interventions for children with Down syndrome from 0 to 6 years.对0至6岁唐氏综合征儿童言语、语言和沟通干预措施的系统评价。
Int J Lang Commun Disord. 2022 Mar;57(2):441-463. doi: 10.1111/1460-6984.12699. Epub 2022 Feb 22.
5
Interventions for childhood apraxia of speech.儿童言语失用症的干预措施。
Cochrane Database Syst Rev. 2018 May 30;5(5):CD006278. doi: 10.1002/14651858.CD006278.pub3.
6
Performance of Forced-Alignment Algorithms on Children's Speech.强制对齐算法在儿童语音上的性能
J Speech Lang Hear Res. 2021 Jun 18;64(6S):2213-2222. doi: 10.1044/2020_JSLHR-20-00268. Epub 2021 Mar 11.
7
Improving Cognitive Empathy Through Traumatic Brain Injury Experiential Learning: A Novel Mixed Methods Approach for Speech-Language Pathology Graduate Education.通过创伤性脑损伤体验式学习提高认知同理心:一种用于言语语言病理学研究生教育的新型混合方法。
Am J Speech Lang Pathol. 2025 Jan 13:1-33. doi: 10.1044/2024_AJSLP-24-00126.
8
Screening for speech and language delay in preschool children: systematic evidence review for the US Preventive Services Task Force.学龄前儿童言语和语言发育迟缓筛查:美国预防服务工作组的系统证据综述
Pediatrics. 2006 Feb;117(2):e298-319. doi: 10.1542/peds.2005-1467.
9
Parent-mediated communication interventions for improving the communication skills of preschool children with non-progressive motor disorders.家长介导的沟通干预对改善非进行性运动障碍学龄前儿童沟通技巧的作用
Cochrane Database Syst Rev. 2018 Jul 24;7(7):CD012507. doi: 10.1002/14651858.CD012507.pub2.
10
Do you like my voice? Stakeholder perspectives about the acceptability of synthetic child voices in three South African languages.你喜欢我的声音吗?利益相关者对三种南非语言中合成儿童声音可接受性的看法。
Int J Lang Commun Disord. 2025 Jan-Feb;60(1):e13152. doi: 10.1111/1460-6984.13152.

本文引用的文献

1
Speech Development Between 30 and 119 Months in Typical Children II: Articulation Rate Growth Curves.典型儿童 30-119 个月的言语发育 II:构音速率增长曲线。
J Speech Lang Hear Res. 2021 Nov 8;64(11):4057-4070. doi: 10.1044/2021_JSLHR-21-00206. Epub 2021 Sep 29.
2
Performance of Forced-Alignment Algorithms on Children's Speech.强制对齐算法在儿童语音上的性能
J Speech Lang Hear Res. 2021 Jun 18;64(6S):2213-2222. doi: 10.1044/2020_JSLHR-20-00268. Epub 2021 Mar 11.
3
Author Correction: Early detection and tracking of bulbar changes in ALS via frequent and remote speech analysis.
作者更正:通过频繁且远程的语音分析早期检测和追踪肌萎缩侧索硬化症中的延髓变化。
NPJ Digit Med. 2020 Nov 20;3(1):154. doi: 10.1038/s41746-020-00364-6.
4
Examining Factors Influencing the Viability of Automatic Acoustic Analysis of Child Speech.探究影响儿童语音自动声学分析可行性的因素。
J Speech Lang Hear Res. 2018 Oct 26;61(10):2487-2501. doi: 10.1044/2018_JSLHR-S-17-0275.
5
Methods for eliciting, annotating, and analyzing databases for child speech development.用于引发、注释和分析儿童语言发展数据库的方法。
Comput Speech Lang. 2017 Sep;45:278-299. doi: 10.1016/j.csl.2017.02.010.
6
Motor speech impairment, activity, and participation in children with cerebral palsy.脑瘫儿童的运动性言语障碍、活动与参与情况
Int J Speech Lang Pathol. 2014 Aug;16(4):427-35. doi: 10.3109/17549507.2014.917439. Epub 2014 Jun 9.
7
Automatic measurement of voice onset time using discriminative structured prediction.基于判别结构预测的语音起始时间自动测量。
J Acoust Soc Am. 2012 Dec;132(6):3965-79. doi: 10.1121/1.4763995.
8
Automatic estimation of voice onset time for word-initial stops by applying random forest to onset detection.应用随机森林进行起始检测,自动估计词首塞音的语音起始时间。
J Acoust Soc Am. 2011 Jul;130(1):514-25. doi: 10.1121/1.3592233.