语言无关自动语音识别的领域泛化

Domain Generalization for Language-Independent Automatic Speech Recognition.

作者信息

Gao Heting, Ni Junrui, Zhang Yang, Qian Kaizhi, Chang Shiyu, Hasegawa-Johnson Mark

机构信息

Department of Electrical and Computer Engineering (ECE), Beckman Institute, University of Illinois, Urbana, IL, United States.

MIT-IBM Watson AI Lab, Cambridge, MA, United States.

出版信息

Front Artif Intell. 2022 May 12;5:806274. doi: 10.3389/frai.2022.806274. eCollection 2022.

DOI:10.3389/frai.2022.806274

PMID:35647534

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9133481/

Abstract

A language-independent automatic speech recognizer (ASR) is one that can be used for phonetic transcription in languages other than the languages in which it was trained. Language-independent ASR is difficult to train, because different languages implement phones differently: even when phonemes in two different languages are written using the same symbols in the international phonetic alphabet, they are differentiated by different distributions of language-dependent redundant articulatory features. This article demonstrates that the goal of language-independence may be approximated in different ways, depending on the size of the training set, the presence vs. absence of familial relationships between the training and test languages, and the method used to implement phone recognition or classification. When the training set contains many languages, and when every language in the test set is related (shares the same language family with) a language in the training set, then language-independent ASR may be trained using an empirical risk minimization strategy (e.g., using connectionist temporal classification without extra regularizers). When the training set is limited to a small number of languages from one language family, however, and the test languages are not from the same language family, then the best performance is achieved by using domain-invariant representation learning strategies. Two different representation learning strategies are tested in this article: invariant risk minimization, and regret minimization. We find that invariant risk minimization is better at the task of phone token classification (given known segment boundary times), while regret minimization is better at the task of phone token recognition.

摘要

一种与语言无关的自动语音识别器（ASR）是一种可用于除其训练语言之外的其他语言进行语音转录的识别器。与语言无关的ASR很难训练，因为不同的语言以不同的方式实现音素：即使两种不同语言中的音素在国际音标中使用相同的符号书写，它们也因依赖于语言的冗余发音特征的不同分布而有所区别。本文表明，根据训练集的大小、训练语言和测试语言之间是否存在亲缘关系以及用于实现音素识别或分类的方法，与语言无关的目标可以通过不同的方式来近似实现。当训练集包含多种语言，并且测试集中的每种语言都与训练集中的一种语言相关（属于同一语系）时，那么可以使用经验风险最小化策略（例如，使用无额外正则化器的联结主义时间分类）来训练与语言无关的ASR。然而，当训练集限于来自一个语系的少数几种语言，并且测试语言不属于同一语系时，那么使用域不变表示学习策略可获得最佳性能。本文测试了两种不同的表示学习策略：不变风险最小化和遗憾最小化。我们发现，在音素令牌分类任务（给定已知的片段边界时间）中，不变风险最小化表现更好，而在音素令牌识别任务中，遗憾最小化表现更好。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

语言无关自动语音识别的领域泛化

Domain Generalization for Language-Independent Automatic Speech Recognition.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

语言无关自动语音识别的领域泛化

Domain Generalization for Language-Independent Automatic Speech Recognition.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献