• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多语言端到端 ASR 用于资源匮乏的具有通用字母表的突厥语。

Multilingual end-to-end ASR for low-resource Turkic languages with common alphabets.

机构信息

Satbayev University, Almaty, Kazakhstan.

Narxoz University, Almaty, Kazakhstan.

出版信息

Sci Rep. 2024 Jun 15;14(1):13835. doi: 10.1038/s41598-024-64848-1.

DOI:10.1038/s41598-024-64848-1
PMID:38879705
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11180099/
Abstract

To obtain a reliable and accurate automatic speech recognition (ASR) machine learning model, it is necessary to have sufficient audio data transcribed, for training. Many languages in the world, especially the agglutinative languages of the Turkic family, suffer from a lack of this type of data. Many studies have been conducted in order to obtain better models for low-resource languages, using different approaches. The most popular approaches include multilingual training and transfer learning. In this study, we combined five agglutinative languages from the Turkic family-Kazakh, Bashkir, Kyrgyz, Sakha, and Tatar,-in order to provide multilingual training using connectionist temporal classification and an attention mechanism including a language model, because these languages have cognate words, sentence formation rules, and alphabet (Cyrillic). Data from the open-source database Common voice was used for the study, to make the experiments reproducible. The results of the experiments showed that multilingual training could improve ASR performances for all languages included in the experiment, except Bashkir language. A dramatic result was achieved for the Kyrgyz language: word error rate decreased to nearly one-fifth and character error rate decreased to one-fourth, which proves that this approach can be helpful for critically low-resource languages.

摘要

为了获得可靠且准确的自动语音识别(ASR)机器学习模型,需要有足够的音频数据进行转录,以用于训练。世界上许多语言,特别是突厥语族的粘着语言,都缺乏这种类型的数据。为了获得针对低资源语言的更好模型,已经进行了许多研究,采用了不同的方法。最流行的方法包括多语言训练和迁移学习。在这项研究中,我们结合了突厥语族的五种粘着语言——哈萨克语、巴什基尔语、吉尔吉斯语、萨哈语和鞑靼语——以便使用连接时间分类和包括语言模型在内的注意力机制进行多语言训练,因为这些语言具有同源词、句子构成规则和字母表(西里尔字母)。研究使用了来自开源数据库 Common voice 的数据,以使实验可重现。实验结果表明,多语言训练可以提高实验中包含的所有语言的 ASR 性能,除了巴什基尔语。吉尔吉斯语的结果非常显著:单词错误率降低到近五分之一,字符错误率降低到四分之一,这证明了这种方法对于资源极度匮乏的语言很有帮助。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9876/11180099/f62795252de7/41598_2024_64848_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9876/11180099/7bd5c212d171/41598_2024_64848_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9876/11180099/651a607f0649/41598_2024_64848_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9876/11180099/f62795252de7/41598_2024_64848_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9876/11180099/7bd5c212d171/41598_2024_64848_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9876/11180099/651a607f0649/41598_2024_64848_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9876/11180099/f62795252de7/41598_2024_64848_Fig3_HTML.jpg

相似文献

1
Multilingual end-to-end ASR for low-resource Turkic languages with common alphabets.多语言端到端 ASR 用于资源匮乏的具有通用字母表的突厥语。
Sci Rep. 2024 Jun 15;14(1):13835. doi: 10.1038/s41598-024-64848-1.
2
A study of transformer-based end-to-end speech recognition system for Kazakh language.基于变压器的端到端哈萨克语语音识别系统研究。
Sci Rep. 2022 May 18;12(1):8337. doi: 10.1038/s41598-022-12260-y.
3
A Study of Speech Recognition for Kazakh Based on Unsupervised Pre-Training.基于无监督预训练的哈萨克语语音识别研究
Sensors (Basel). 2023 Jan 12;23(2):870. doi: 10.3390/s23020870.
4
Improving Hybrid CTC/Attention Architecture for Agglutinative Language Speech Recognition.改进用于黏着语语音识别的混合CTC/注意力架构
Sensors (Basel). 2022 Sep 27;22(19):7319. doi: 10.3390/s22197319.
5
Perspectives and Experiences of Autistic Multilingual Adults: A Qualitative Analysis.自闭症多语言成年人的观点与经历:一项定性分析。
Autism Adulthood. 2021 Dec 1;3(4):310-319. doi: 10.1089/aut.2020.0067. Epub 2021 Dec 7.
6
End-to-end keyword search system based on attention mechanism and energy scorer for low resource languages.基于注意力机制和能量得分器的针对低资源语言的端到端关键词搜索系统。
Neural Netw. 2021 Jul;139:326-334. doi: 10.1016/j.neunet.2021.04.002. Epub 2021 Apr 10.
7
Assessing the speech production of multilingual children: A survey of speech-language therapists in French-speaking Belgium.评估多语言儿童的言语产生:对比利时法语区言语治疗师的一项调查。
Int J Lang Commun Disord. 2023 Sep-Oct;58(5):1496-1509. doi: 10.1111/1460-6984.12875. Epub 2023 Apr 12.
8
Domain Generalization for Language-Independent Automatic Speech Recognition.语言无关自动语音识别的领域泛化
Front Artif Intell. 2022 May 12;5:806274. doi: 10.3389/frai.2022.806274. eCollection 2022.
9
Oral diadochokinetic rates across languages: Multilingual speakers comparison.跨语言的口腔交替运动率:多语言使用者比较。
Int J Lang Commun Disord. 2021 Sep;56(5):1026-1036. doi: 10.1111/1460-6984.12653. Epub 2021 Jul 31.
10
A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems.用于空中交通管制系统的多语言语音识别的统一框架。
IEEE Trans Neural Netw Learn Syst. 2021 Aug;32(8):3608-3620. doi: 10.1109/TNNLS.2020.3015830. Epub 2021 Aug 3.

本文引用的文献

1
A study of transformer-based end-to-end speech recognition system for Kazakh language.基于变压器的端到端哈萨克语语音识别系统研究。
Sci Rep. 2022 May 18;12(1):8337. doi: 10.1038/s41598-022-12260-y.