文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

端到端约旦方言语音到文本的自监督学习框架。

End-to-end Jordanian dialect speech-to-text self-supervised learning framework.

作者信息

Safieh Ali A, Alhaol Ibrahim Abu, Ghnemat Rawan

机构信息

Data Science Department, King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Amman, Jordan.

出版信息

Front Robot AI. 2022 Dec 22;9:1090012. doi: 10.3389/frobt.2022.1090012. eCollection 2022.


DOI:10.3389/frobt.2022.1090012
PMID:36618013
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9815896/
Abstract

Speech-to-text engines are extremely needed nowadays for different applications, representing an essential enabler in human-robot interaction. Still, some languages suffer from the lack of labeled speech data, especially in the Arabic dialects or any low-resource languages. The need for a self-supervised training process and self-training using noisy training is proven to be one of the up-and-coming feasible solutions. This article proposes an end-to-end, transformers-based model with a framework for low-resource languages. In addition, the framework incorporates customized audio-to-text processing algorithms to achieve a highly efficient Jordanian Arabic dialect speech-to-text system. The proposed framework enables ingesting data from many sources, making the ground truth from external sources possible by speeding up the manual annotation process. The framework allows the training process using noisy student training and self-supervised learning to utilize the unlabeled data in both pre- and post-training stages and incorporate multiple types of data augmentation. The proposed self-training approach outperforms the fine-tuned Wav2Vec model by 5% in terms of word error rate reduction. The outcome of this work provides the research community with a Jordanian-spoken data set along with an end-to-end approach to deal with low-resource languages. This is done by utilizing the power of the pretraining, post-training, and injecting noisy labeled and augmented data with minimal human intervention. It enables the development of new applications in the field of Arabic language speech-to-text area like the question-answering systems and intelligent control systems, and it will add human-like perception and hearing sensors to intelligent robots.

摘要

如今,语音转文本引擎在不同应用中极其必要,是人机交互中的一项关键促成因素。然而,一些语言缺乏标注语音数据,尤其是阿拉伯方言或任何低资源语言。事实证明,对自监督训练过程和使用噪声训练进行自我训练的需求是可行的新兴解决方案之一。本文提出了一种基于Transformer的端到端模型,用于低资源语言的框架。此外,该框架还纳入了定制的音频到文本处理算法,以实现高效的约旦阿拉伯方言语音转文本系统。所提出的框架能够从多个来源摄取数据,通过加快人工标注过程使来自外部来源的真实数据成为可能。该框架允许使用噪声学生训练和自监督学习进行训练过程,以在训练前和训练后阶段利用未标注数据,并纳入多种类型的数据增强。所提出的自我训练方法在降低字错误率方面比微调后的Wav2Vec模型高出5%。这项工作的成果为研究界提供了一个约旦语数据集以及一种处理低资源语言的端到端方法。这是通过利用预训练、后训练的能力,并以最少的人工干预注入噪声标注和增强数据来实现的。它能够在阿拉伯语语音转文本领域开发新应用,如问答系统和智能控制系统,并将为智能机器人添加类人感知和听觉传感器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/092ec71eee42/frobt-09-1090012-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/796f3262cdb5/frobt-09-1090012-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/3e596d869e5a/frobt-09-1090012-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/54b5cbfee1b4/frobt-09-1090012-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/a71bf1573b21/frobt-09-1090012-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/c2b5d95788f7/frobt-09-1090012-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/6b77edf4f2b9/frobt-09-1090012-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/fcfbfdeb334c/frobt-09-1090012-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/04bc734a240a/frobt-09-1090012-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/7e87e3ba045d/frobt-09-1090012-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/c22a09f230d7/frobt-09-1090012-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/092ec71eee42/frobt-09-1090012-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/796f3262cdb5/frobt-09-1090012-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/3e596d869e5a/frobt-09-1090012-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/54b5cbfee1b4/frobt-09-1090012-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/a71bf1573b21/frobt-09-1090012-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/c2b5d95788f7/frobt-09-1090012-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/6b77edf4f2b9/frobt-09-1090012-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/fcfbfdeb334c/frobt-09-1090012-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/04bc734a240a/frobt-09-1090012-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/7e87e3ba045d/frobt-09-1090012-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/c22a09f230d7/frobt-09-1090012-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39ad/9815896/092ec71eee42/frobt-09-1090012-g011.jpg

相似文献

[1]
End-to-end Jordanian dialect speech-to-text self-supervised learning framework.

Front Robot AI. 2022-12-22

[2]
A Study of Speech Recognition for Kazakh Based on Unsupervised Pre-Training.

Sensors (Basel). 2023-1-12

[3]
Hate speech detection in the Arabic language: corpus design, construction, and evaluation.

Front Artif Intell. 2024-2-20

[4]
Speech recognition datasets for low-resource Congolese languages.

Data Brief. 2023-11-10

[5]
Improving neural machine translation for low resource languages through non-parallel corpora: a case study of Egyptian dialect to modern standard Arabic translation.

Sci Rep. 2024-1-27

[6]
Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing.

Neural Netw. 2022-4

[7]
Advances in Completely Automated Vowel Analysis for Sociophonetics: Using End-to-End Speech Recognition Systems With DARLA.

Front Artif Intell. 2021-9-24

[8]
A Novel Framework for Arabic Dialect Chatbot Using Machine Learning.

Comput Intell Neurosci. 2022

[9]
Utterance-based proposed spot diagnostic system of vocal tract malfunction.

Biomed Sci Instrum. 2001

[10]
A Generic Semi-Supervised and Active Learning Framework for Biomedical Text Classification.

Annu Int Conf IEEE Eng Med Biol Soc. 2022-7

引用本文的文献

[1]
Explainable Artificial Intelligence (XAI) for Deep Learning Based Medical Imaging Classification.

J Imaging. 2023-8-30

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索