• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

语言无关自动语音识别的领域泛化

Domain Generalization for Language-Independent Automatic Speech Recognition.

作者信息

Gao Heting, Ni Junrui, Zhang Yang, Qian Kaizhi, Chang Shiyu, Hasegawa-Johnson Mark

机构信息

Department of Electrical and Computer Engineering (ECE), Beckman Institute, University of Illinois, Urbana, IL, United States.

MIT-IBM Watson AI Lab, Cambridge, MA, United States.

出版信息

Front Artif Intell. 2022 May 12;5:806274. doi: 10.3389/frai.2022.806274. eCollection 2022.

DOI:10.3389/frai.2022.806274
PMID:35647534
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9133481/
Abstract

A language-independent automatic speech recognizer (ASR) is one that can be used for phonetic transcription in languages other than the languages in which it was trained. Language-independent ASR is difficult to train, because different languages implement phones differently: even when phonemes in two different languages are written using the same symbols in the international phonetic alphabet, they are differentiated by different distributions of language-dependent redundant articulatory features. This article demonstrates that the goal of language-independence may be approximated in different ways, depending on the size of the training set, the presence vs. absence of familial relationships between the training and test languages, and the method used to implement phone recognition or classification. When the training set contains many languages, and when every language in the test set is related (shares the same language family with) a language in the training set, then language-independent ASR may be trained using an empirical risk minimization strategy (e.g., using connectionist temporal classification without extra regularizers). When the training set is limited to a small number of languages from one language family, however, and the test languages are not from the same language family, then the best performance is achieved by using domain-invariant representation learning strategies. Two different representation learning strategies are tested in this article: invariant risk minimization, and regret minimization. We find that invariant risk minimization is better at the task of phone token classification (given known segment boundary times), while regret minimization is better at the task of phone token recognition.

摘要

一种与语言无关的自动语音识别器(ASR)是一种可用于除其训练语言之外的其他语言进行语音转录的识别器。与语言无关的ASR很难训练,因为不同的语言以不同的方式实现音素:即使两种不同语言中的音素在国际音标中使用相同的符号书写,它们也因依赖于语言的冗余发音特征的不同分布而有所区别。本文表明,根据训练集的大小、训练语言和测试语言之间是否存在亲缘关系以及用于实现音素识别或分类的方法,与语言无关的目标可以通过不同的方式来近似实现。当训练集包含多种语言,并且测试集中的每种语言都与训练集中的一种语言相关(属于同一语系)时,那么可以使用经验风险最小化策略(例如,使用无额外正则化器的联结主义时间分类)来训练与语言无关的ASR。然而,当训练集限于来自一个语系的少数几种语言,并且测试语言不属于同一语系时,那么使用域不变表示学习策略可获得最佳性能。本文测试了两种不同的表示学习策略:不变风险最小化和遗憾最小化。我们发现,在音素令牌分类任务(给定已知的片段边界时间)中,不变风险最小化表现更好,而在音素令牌识别任务中,遗憾最小化表现更好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/537b/9133481/fea36a5c85f6/frai-05-806274-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/537b/9133481/b5612ffed400/frai-05-806274-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/537b/9133481/7854736893e8/frai-05-806274-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/537b/9133481/fea36a5c85f6/frai-05-806274-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/537b/9133481/b5612ffed400/frai-05-806274-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/537b/9133481/7854736893e8/frai-05-806274-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/537b/9133481/fea36a5c85f6/frai-05-806274-g0003.jpg

相似文献

1
Domain Generalization for Language-Independent Automatic Speech Recognition.语言无关自动语音识别的领域泛化
Front Artif Intell. 2022 May 12;5:806274. doi: 10.3389/frai.2022.806274. eCollection 2022.
2
Multilingual end-to-end ASR for low-resource Turkic languages with common alphabets.多语言端到端 ASR 用于资源匮乏的具有通用字母表的突厥语。
Sci Rep. 2024 Jun 15;14(1):13835. doi: 10.1038/s41598-024-64848-1.
3
A Study of Speech Recognition for Kazakh Based on Unsupervised Pre-Training.基于无监督预训练的哈萨克语语音识别研究
Sensors (Basel). 2023 Jan 12;23(2):870. doi: 10.3390/s23020870.
4
Using Automatic Speech Recognition to Assess Thai Speech Language Fluency in the Montreal Cognitive Assessment (MoCA).利用自动语音识别评估蒙特利尔认知评估(MoCA)中的泰语言语流畅度。
Sensors (Basel). 2022 Feb 17;22(4):1583. doi: 10.3390/s22041583.
5
Advances in Completely Automated Vowel Analysis for Sociophonetics: Using End-to-End Speech Recognition Systems With DARLA.社会语音学中全自动化元音分析的进展:使用带有DARLA的端到端语音识别系统
Front Artif Intell. 2021 Sep 24;4:662097. doi: 10.3389/frai.2021.662097. eCollection 2021.
6
Heterophonic speech recognition using composite phones.使用复合音素的异音语音识别。
Springerplus. 2016 Nov 24;5(1):2008. doi: 10.1186/s40064-016-3332-9. eCollection 2016.
7
A study of transformer-based end-to-end speech recognition system for Kazakh language.基于变压器的端到端哈萨克语语音识别系统研究。
Sci Rep. 2022 May 18;12(1):8337. doi: 10.1038/s41598-022-12260-y.
8
On the development of speech resources for the Mixtec language.关于米斯特克语语音资源的开发。
ScientificWorldJournal. 2013 Apr 16;2013:170649. doi: 10.1155/2013/170649. Print 2013.
9
Non-native acoustic modeling for mispronunciation verification based on language adversarial representation learning.基于语言对抗表示学习的非母语发音验证的声学建模。
Neural Netw. 2021 Oct;142:597-607. doi: 10.1016/j.neunet.2021.07.017. Epub 2021 Jul 17.
10
Speech recognition datasets for low-resource Congolese languages.针对资源匮乏的刚果语言的语音识别数据集。
Data Brief. 2023 Nov 10;52:109796. doi: 10.1016/j.dib.2023.109796. eCollection 2024 Feb.

引用本文的文献

1
A bilingual speech neuroprosthesis driven by cortical articulatory representations shared between languages.一种由两种语言之间共享的皮质发音表征驱动的双语言语神经假体。
Nat Biomed Eng. 2024 Aug;8(8):977-991. doi: 10.1038/s41551-024-01207-5. Epub 2024 May 20.

本文引用的文献

1
Acoustic and perceptual similarity of Japanese and American English vowels.日语和美式英语元音的声学及感知相似性。
J Acoust Soc Am. 2008 Jul;124(1):576-88. doi: 10.1121/1.2931949.