通过判别式学习实现非母语儿童语音识别的音频增强

Audio Augmentation for Non-Native Children's Speech Recognition through Discriminative Learning.

作者信息

Radha Kodali, Bansal Mohan

机构信息

School of Electronics Engineering, VIT-AP University, Amaravati 522237, India.

出版信息

Entropy (Basel). 2022 Oct 19;24(10):1490. doi: 10.3390/e24101490.

DOI:10.3390/e24101490

PMID:37420510

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9601443/

Abstract

Automatic speech recognition (ASR) in children is a rapidly evolving field, as children become more accustomed to interacting with virtual assistants, such as Amazon Echo, Cortana, and other smart speakers, and it has advanced the human-computer interaction in recent generations. Furthermore, non-native children are observed to exhibit a diverse range of reading errors during second language (L2) acquisition, such as lexical disfluency, hesitations, intra-word switching, and word repetitions, which are not yet addressed, resulting in ASR's struggle to recognize non-native children's speech. The main objective of this study is to develop a non-native children's speech recognition system on top of feature-space discriminative models, such as feature-space maximum mutual information (fMMI) and boosted feature-space maximum mutual information (fbMMI). Harnessing the collaborative power of speed perturbation-based data augmentation on the original children's speech corpora yields an effective performance. The corpus focuses on different speaking styles of children, together with read speech and spontaneous speech, in order to investigate the impact of non-native children's L2 speaking proficiency on speech recognition systems. The experiments revealed that feature-space MMI models with steadily increasing speed perturbation factors outperform traditional ASR baseline models.

摘要

儿童自动语音识别（ASR）是一个快速发展的领域，因为儿童越来越习惯于与虚拟助手互动，如亚马逊Echo、Cortana和其他智能音箱，并且它推动了近几代人的人机交互。此外，非母语儿童在第二语言（L2）习得过程中会出现各种阅读错误，如词汇不流畅、犹豫、词内转换和单词重复，这些问题尚未得到解决，导致ASR难以识别非母语儿童的语音。本研究的主要目标是在特征空间判别模型之上开发一个非母语儿童语音识别系统，如特征空间最大互信息（fMMI）和增强特征空间最大互信息（fbMMI）。利用基于速度扰动的数据增强对原始儿童语音语料库的协同作用，可产生有效的性能。该语料库关注儿童的不同说话风格，以及朗读语音和自发语音，以研究非母语儿童的L2口语能力对语音识别系统的影响。实验表明，速度扰动因子稳步增加的特征空间MMI模型优于传统的ASR基线模型。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过判别式学习实现非母语儿童语音识别的音频增强

Audio Augmentation for Non-Native Children's Speech Recognition through Discriminative Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

通过判别式学习实现非母语儿童语音识别的音频增强

Audio Augmentation for Non-Native Children's Speech Recognition through Discriminative Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献