• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

芬兰议会ASR语料库:分析、基准与统计数据。

Finnish parliament ASR corpus: Analysis, benchmarks and statistics.

作者信息

Virkkunen Anja, Rouhe Aku, Phan Nhan, Kurimo Mikko

机构信息

Department of Information and Communications Engineering, Aalto University, Espoo, Finland.

出版信息

Lang Resour Eval. 2023 Mar 27:1-26. doi: 10.1007/s10579-023-09650-7.

DOI:10.1007/s10579-023-09650-7
PMID:37360261
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10040906/
Abstract

Public sources like parliament meeting recordings and transcripts provide ever-growing material for the training and evaluation of automatic speech recognition (ASR) systems. In this paper, we publish and analyse the Finnish Parliament ASR Corpus, the most extensive publicly available collection of manually transcribed speech data for Finnish with over 3000 h of speech and 449 speakers for which it provides rich demographic metadata. This corpus builds on earlier initial work, and as a result the corpus has a natural split into two training subsets from two periods of time. Similarly, there are two official, corrected test sets covering different times, setting an ASR task with longitudinal distribution-shift characteristics. An official development set is also provided. We developed a complete Kaldi-based data preparation pipeline and ASR recipes for hidden Markov models (HMM), hybrid deep neural networks (HMM-DNN), and attention-based encoder-decoders (AED). For HMM-DNN systems, we provide results with time-delay neural networks (TDNN) as well as state-of-the-art wav2vec 2.0 pretrained acoustic models. We set benchmarks on the official test sets and multiple other recently used test sets. Both temporal corpus subsets are already large, and we observe that beyond their scale, HMM-TDNN ASR performance on the official test sets has reached a plateau. In contrast, other domains and larger wav2vec 2.0 models benefit from added data. The HMM-DNN and AED approaches are compared in a carefully matched equal data setting, with the HMM-DNN system consistently performing better. Finally, the variation of the ASR accuracy is compared between the speaker categories available in the parliament metadata to detect potential biases based on factors such as gender, age, and education.

摘要

像议会会议录音和文字记录这样的公开资源为自动语音识别(ASR)系统的训练和评估提供了越来越多的材料。在本文中,我们发布并分析了芬兰议会ASR语料库,这是芬兰最广泛的公开可用的人工转录语音数据集合,有超过3000小时的语音和449名说话者,并且提供了丰富的人口统计元数据。这个语料库建立在早期的初步工作基础上,因此该语料库自然地分为来自两个时间段的两个训练子集。同样,有两个官方的、经过校正的测试集,覆盖不同的时间,设置了一个具有纵向分布变化特征的ASR任务。还提供了一个官方开发集。我们为隐马尔可夫模型(HMM)、混合深度神经网络(HMM-DNN)和基于注意力的编码器-解码器(AED)开发了一个完整的基于Kaldi的数据准备管道和ASR方法。对于HMM-DNN系统,我们提供了使用时延神经网络(TDNN)以及最先进的wav2vec 2.0预训练声学模型的结果。我们在官方测试集和其他多个最近使用的测试集上设定了基准。两个时间语料库子集已经很大,并且我们观察到,除了它们的规模之外,官方测试集上的HMM-TDNN ASR性能已经达到了一个平台期。相比之下,其他领域和更大的wav2vec 2.0模型受益于增加的数据。在精心匹配的相等数据设置中比较了HMM-DNN和AED方法,HMM-DNN系统始终表现得更好。最后,比较了议会元数据中可用的说话者类别之间的ASR准确性变化,以检测基于性别、年龄和教育等因素的潜在偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/57eb/10040906/f7fa757e2666/10579_2023_9650_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/57eb/10040906/4ac763088416/10579_2023_9650_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/57eb/10040906/8c600b48fbab/10579_2023_9650_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/57eb/10040906/f7fa757e2666/10579_2023_9650_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/57eb/10040906/4ac763088416/10579_2023_9650_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/57eb/10040906/8c600b48fbab/10579_2023_9650_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/57eb/10040906/f7fa757e2666/10579_2023_9650_Fig3_HTML.jpg

相似文献

1
Finnish parliament ASR corpus: Analysis, benchmarks and statistics.芬兰议会ASR语料库:分析、基准与统计数据。
Lang Resour Eval. 2023 Mar 27:1-26. doi: 10.1007/s10579-023-09650-7.
2
Domain Adaptation with Augmented Data by Deep Neural Network Based Method Using Re-Recorded Speech for Automatic Speech Recognition in Real Environment.基于深度神经网络的扩充数据域自适应方法在真实环境下的自动语音识别中的再录音语音应用。
Sensors (Basel). 2022 Dec 16;22(24):9945. doi: 10.3390/s22249945.
3
Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT.基于深度神经网络的语音识别的多分辨率语音分析:在 TIMIT 上的实验。
PLoS One. 2018 Oct 10;13(10):e0205355. doi: 10.1371/journal.pone.0205355. eCollection 2018.
4
Using Automatic Speech Recognition to Assess Thai Speech Language Fluency in the Montreal Cognitive Assessment (MoCA).利用自动语音识别评估蒙特利尔认知评估(MoCA)中的泰语言语流畅度。
Sensors (Basel). 2022 Feb 17;22(4):1583. doi: 10.3390/s22041583.
5
Advances in Completely Automated Vowel Analysis for Sociophonetics: Using End-to-End Speech Recognition Systems With DARLA.社会语音学中全自动化元音分析的进展:使用带有DARLA的端到端语音识别系统
Front Artif Intell. 2021 Sep 24;4:662097. doi: 10.3389/frai.2021.662097. eCollection 2021.
6
Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.基于机器学习的方言阿萨姆语语音自动识别样本提取。
Neural Netw. 2016 Jun;78:97-111. doi: 10.1016/j.neunet.2015.12.010. Epub 2015 Dec 30.
7
Using Hybrid HMM/DNN Embedding Extractor Models in Computational Paralinguistic Tasks.在计算副语言任务中使用混合 HMM/DNN 嵌入提取器模型。
Sensors (Basel). 2023 May 30;23(11):5208. doi: 10.3390/s23115208.
8
Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks.发表演讲:一个带有一些基准的芬兰语口语大规模语料库。
Lang Resour Eval. 2022 Aug 9:1-33. doi: 10.1007/s10579-022-09606-3.
9
Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations.从成人到儿童的语音识别迁移学习:评估、分析与建议
Comput Speech Lang. 2020 Sep;63. doi: 10.1016/j.csl.2020.101077. Epub 2020 Feb 18.
10
Research on Pig Sound Recognition Based on Deep Neural Network and Hidden Markov Models.基于深度神经网络和隐马尔可夫模型的猪声识别研究。
Sensors (Basel). 2024 Feb 16;24(4):1269. doi: 10.3390/s24041269.

引用本文的文献

1
Speech recognition datasets for low-resource Congolese languages.针对资源匮乏的刚果语言的语音识别数据集。
Data Brief. 2023 Nov 10;52:109796. doi: 10.1016/j.dib.2023.109796. eCollection 2024 Feb.