Suppr超能文献

从成人到儿童的语音识别迁移学习:评估、分析与建议

Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations.

作者信息

Shivakumar Prashanth Gurunath, Georgiou Panayiotis

机构信息

Signal Processing for Communication Understanding & Behavior Analysis (SCUBA) Lab, University of Southern California, Los Angeles, California, USA.

出版信息

Comput Speech Lang. 2020 Sep;63. doi: 10.1016/j.csl.2020.101077. Epub 2020 Feb 18.

Abstract

Children speech recognition is challenging mainly due to the inherent high variability in children's physical and articulatory characteristics and expressions. This variability manifests in both acoustic constructs and linguistic usage due to the rapidly changing developmental stage in children's life. Part of the challenge is due to the lack of large amounts of available children speech data for efficient modeling. This work attempts to address the key challenges using transfer learning from adult's models to children's models in a Deep Neural Network (DNN) framework for children's Automatic Speech Recognition (ASR) task evaluating on multiple children's speech corpora with a large vocabulary. The paper presents a systematic and an extensive analysis of the proposed transfer learning technique considering the key factors affecting children's speech recognition from prior literature. are presented on (i) comparisons of earlier GMM-HMM and the newer DNN Models, (ii) effectiveness of standard adaptation techniques versus transfer learning, (iii) various adaptation configurations in tackling the variabilities present in children speech, in terms of (a) acoustic spectral variability, and (b) pronunciation variability and linguistic constraints. Our spans over (i) number of DNN model parameters (for adaptation), (ii) amount of adaptation data, (iii) ages of children, (iv) age dependent-independent adaptation. Finally, we provide on (i) the favorable strategies over various aforementioned - analyzed parameters, and (ii) potential future research directions and relevant challenges/problems persisting in DNN based ASR for children's speech.

摘要

儿童语音识别具有挑战性,主要是因为儿童的身体和发音特征及表达方式存在固有的高度变异性。由于儿童在成长过程中发育阶段快速变化,这种变异性在声学结构和语言使用中都有体现。部分挑战源于缺乏大量可用的儿童语音数据用于高效建模。这项工作试图在深度神经网络(DNN)框架下,通过从成人模型到儿童模型的迁移学习来解决关键挑战,以用于儿童自动语音识别(ASR)任务,该任务在多个具有大词汇量的儿童语音语料库上进行评估。本文基于先前文献中影响儿童语音识别的关键因素,对所提出的迁移学习技术进行了系统而广泛的分析。呈现了以下内容:(i)早期高斯混合模型 - 隐马尔可夫模型(GMM - HMM)与更新的DNN模型的比较;(ii)标准自适应技术与迁移学习的有效性;(iii)在应对儿童语音中存在的变异性方面的各种自适应配置,包括(a)声学频谱变异性和(b)发音变异性及语言限制。我们的研究涵盖了(i)DNN模型参数数量(用于自适应);(ii)自适应数据量;(iii)儿童年龄;(iv)年龄相关 - 无关自适应。最后,我们提供了关于(i)在上述各种分析参数上的有利策略,以及(ii)基于DNN的儿童语音ASR中潜在的未来研究方向和持续存在的相关挑战/问题的内容。

相似文献

8
Improving Acoustic Models in TORGO Dysarthric Speech Database.改善 TORGO 构音障碍语音数据库中的声学模型。
IEEE Trans Neural Syst Rehabil Eng. 2018 Mar;26(3):637-645. doi: 10.1109/TNSRE.2018.2802914.

引用本文的文献

1
HiACC: Hinglish adult & children code-switched corpus.HiACC:印式英语成人与儿童语码转换语料库。
Data Brief. 2025 Jul 17;62:111886. doi: 10.1016/j.dib.2025.111886. eCollection 2025 Oct.
5
REFINING AUTOMATIC SPEECH RECOGNITION SYSTEM FOR OLDER ADULTS.优化老年人自动语音识别系统
Proc IEEE Int Conf Acoust Speech Signal Process. 2021 Jun;2021:7003-7007. doi: 10.1109/icassp39728.2021.9414207. Epub 2021 May 13.
6
Tracking Child Language Development With Neural Network Language Models.利用神经网络语言模型追踪儿童语言发展
Front Psychol. 2021 Jul 8;12:674402. doi: 10.3389/fpsyg.2021.674402. eCollection 2021.
8
Integrating Machine Learning with Human Knowledge.将机器学习与人类知识相结合。
iScience. 2020 Oct 9;23(11):101656. doi: 10.1016/j.isci.2020.101656. eCollection 2020 Nov 20.

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验