• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过判别式学习实现非母语儿童语音识别的音频增强

Audio Augmentation for Non-Native Children's Speech Recognition through Discriminative Learning.

作者信息

Radha Kodali, Bansal Mohan

机构信息

School of Electronics Engineering, VIT-AP University, Amaravati 522237, India.

出版信息

Entropy (Basel). 2022 Oct 19;24(10):1490. doi: 10.3390/e24101490.

DOI:10.3390/e24101490
PMID:37420510
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9601443/
Abstract

Automatic speech recognition (ASR) in children is a rapidly evolving field, as children become more accustomed to interacting with virtual assistants, such as Amazon Echo, Cortana, and other smart speakers, and it has advanced the human-computer interaction in recent generations. Furthermore, non-native children are observed to exhibit a diverse range of reading errors during second language (L2) acquisition, such as lexical disfluency, hesitations, intra-word switching, and word repetitions, which are not yet addressed, resulting in ASR's struggle to recognize non-native children's speech. The main objective of this study is to develop a non-native children's speech recognition system on top of feature-space discriminative models, such as feature-space maximum mutual information (fMMI) and boosted feature-space maximum mutual information (fbMMI). Harnessing the collaborative power of speed perturbation-based data augmentation on the original children's speech corpora yields an effective performance. The corpus focuses on different speaking styles of children, together with read speech and spontaneous speech, in order to investigate the impact of non-native children's L2 speaking proficiency on speech recognition systems. The experiments revealed that feature-space MMI models with steadily increasing speed perturbation factors outperform traditional ASR baseline models.

摘要

儿童自动语音识别(ASR)是一个快速发展的领域,因为儿童越来越习惯于与虚拟助手互动,如亚马逊Echo、Cortana和其他智能音箱,并且它推动了近几代人的人机交互。此外,非母语儿童在第二语言(L2)习得过程中会出现各种阅读错误,如词汇不流畅、犹豫、词内转换和单词重复,这些问题尚未得到解决,导致ASR难以识别非母语儿童的语音。本研究的主要目标是在特征空间判别模型之上开发一个非母语儿童语音识别系统,如特征空间最大互信息(fMMI)和增强特征空间最大互信息(fbMMI)。利用基于速度扰动的数据增强对原始儿童语音语料库的协同作用,可产生有效的性能。该语料库关注儿童的不同说话风格,以及朗读语音和自发语音,以研究非母语儿童的L2口语能力对语音识别系统的影响。实验表明,速度扰动因子稳步增加的特征空间MMI模型优于传统的ASR基线模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d8d/9601443/820457bdaff9/entropy-24-01490-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d8d/9601443/8082cea28baf/entropy-24-01490-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d8d/9601443/41024bfceac9/entropy-24-01490-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d8d/9601443/32d0aa49d1c4/entropy-24-01490-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d8d/9601443/a2dca89d18fc/entropy-24-01490-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d8d/9601443/40250dbf53aa/entropy-24-01490-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d8d/9601443/820457bdaff9/entropy-24-01490-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d8d/9601443/8082cea28baf/entropy-24-01490-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d8d/9601443/41024bfceac9/entropy-24-01490-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d8d/9601443/32d0aa49d1c4/entropy-24-01490-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d8d/9601443/a2dca89d18fc/entropy-24-01490-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d8d/9601443/40250dbf53aa/entropy-24-01490-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d8d/9601443/820457bdaff9/entropy-24-01490-g006.jpg

相似文献

1
Audio Augmentation for Non-Native Children's Speech Recognition through Discriminative Learning.通过判别式学习实现非母语儿童语音识别的音频增强
Entropy (Basel). 2022 Oct 19;24(10):1490. doi: 10.3390/e24101490.
2
Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions.在匹配和不匹配条件下开发顺序训练的鲁棒旁遮普语语音识别系统。
Complex Intell Systems. 2023;9(1):1-23. doi: 10.1007/s40747-022-00651-7. Epub 2022 Jun 2.
3
Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations.从成人到儿童的语音识别迁移学习:评估、分析与建议
Comput Speech Lang. 2020 Sep;63. doi: 10.1016/j.csl.2020.101077. Epub 2020 Feb 18.
4
Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.基于机器学习的方言阿萨姆语语音自动识别样本提取。
Neural Netw. 2016 Jun;78:97-111. doi: 10.1016/j.neunet.2015.12.010. Epub 2015 Dec 30.
5
Improvement of Acoustic Models Fused with Lip Visual Information for Low-Resource Speech.融合唇动视觉信息的低资源语音声学模型改进
Sensors (Basel). 2023 Feb 12;23(4):2071. doi: 10.3390/s23042071.
6
Using Automatic Speech Recognition to Assess Thai Speech Language Fluency in the Montreal Cognitive Assessment (MoCA).利用自动语音识别评估蒙特利尔认知评估(MoCA)中的泰语言语流畅度。
Sensors (Basel). 2022 Feb 17;22(4):1583. doi: 10.3390/s22041583.
7
Advances in Completely Automated Vowel Analysis for Sociophonetics: Using End-to-End Speech Recognition Systems With DARLA.社会语音学中全自动化元音分析的进展:使用带有DARLA的端到端语音识别系统
Front Artif Intell. 2021 Sep 24;4:662097. doi: 10.3389/frai.2021.662097. eCollection 2021.
8
Prosodic modulations in child-directed language and their impact on word learning.儿童导向语言中的韵律调制及其对词汇学习的影响。
Dev Sci. 2023 Jul;26(4):e13357. doi: 10.1111/desc.13357. Epub 2022 Dec 11.
9
The quality of child-directed speech depends on the speaker's language proficiency.儿童导向言语的质量取决于说话者的语言熟练程度。
J Child Lang. 2020 Jan;47(1):132-145. doi: 10.1017/S030500091900028X. Epub 2019 Jul 12.
10
Improving Hybrid CTC/Attention Architecture for Agglutinative Language Speech Recognition.改进用于黏着语语音识别的混合CTC/注意力架构
Sensors (Basel). 2022 Sep 27;22(19):7319. doi: 10.3390/s22197319.

引用本文的文献

1
Closed-set automatic speaker identification using multi-scale recurrent networks in non-native children.在非母语儿童中使用多尺度递归网络进行闭集自动说话人识别
Int J Inf Technol. 2023;15(3):1375-1385. doi: 10.1007/s41870-023-01224-8. Epub 2023 Mar 18.

本文引用的文献

1
Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition.用于跨语料库语音情感识别的渐进式判别转移网络
Entropy (Basel). 2022 Jul 29;24(8):1046. doi: 10.3390/e24081046.
2
Entropy-Argumentative Concept of Computational Phonetic Analysis of Speech Taking into Account Dialect and Individuality of Phonation.考虑方言和发声个体性的语音计算语音分析的熵论证概念。
Entropy (Basel). 2022 Jul 20;24(7):1006. doi: 10.3390/e24071006.
3
Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition.
基于 BPE-Dropout 的动态声学单元增强在低资源端到端语音识别中的应用。
Sensors (Basel). 2021 Apr 28;21(9):3063. doi: 10.3390/s21093063.
4
Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations.从成人到儿童的语音识别迁移学习:评估、分析与建议
Comput Speech Lang. 2020 Sep;63. doi: 10.1016/j.csl.2020.101077. Epub 2020 Feb 18.