文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

基于优化增益控制策略的普通话自动语音识别性能提升

Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy.

机构信息

Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China.

出版信息

Sensors (Basel). 2022 Apr 15;22(8):3027. doi: 10.3390/s22083027.


DOI:10.3390/s22083027
PMID:35459013
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9027119/
Abstract

Automatic speech recognition (ASR) is an essential technique of human-computer interactions; gain control is a commonly used operation in ASR. However, inappropriate gain control strategies can lead to an increase in the word error rate (WER) of ASR. As there is a current lack of sufficient theoretical analyses and proof of the relationship between gain control and WER, various unconstrained gain control strategies have been adopted on realistic ASR systems, and the optimal gain control with respect to the lowest WER, is rarely achieved. A gain control strategy named maximized original signal transmission (MOST) is proposed in this study to minimize the adverse impact of gain control on ASR systems. First, by modeling the gain control strategy, the quantitative relationship between the gain control strategy and the ASR performance was established using the noise figure index. Second, through an analysis of the quantitative relationship, an optimal MOST gain control strategy with minimal performance degradation was theoretically deduced. Finally, comprehensive comparative experiments on a Mandarin dataset show that the proposed MOST gain control strategy can significantly reduce the WER of the experimental ASR system, with a 10% mean absolute WER reduction at -9 dB gain.

摘要

自动语音识别(ASR)是人机交互的一项关键技术;增益控制是 ASR 中常用的操作。然而,不适当的增益控制策略可能会导致 ASR 的单词错误率(WER)增加。由于目前缺乏关于增益控制和 WER 之间关系的充分理论分析和证明,各种不受约束的增益控制策略已被应用于现实的 ASR 系统中,很少能实现针对最低 WER 的最佳增益控制。本研究提出了一种名为最大原始信号传输(MOST)的增益控制策略,以最小化增益控制对 ASR 系统的不利影响。首先,通过对增益控制策略进行建模,使用噪声系数指标建立了增益控制策略与 ASR 性能之间的定量关系。其次,通过对定量关系的分析,从理论上推导出了具有最小性能降级的最优 MOST 增益控制策略。最后,在普通话数据集上进行了全面的对比实验,结果表明,所提出的 MOST 增益控制策略可以显著降低实验 ASR 系统的 WER,在-9dB 增益下平均绝对 WER 降低了 10%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/8b2d6f36b11e/sensors-22-03027-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/47a5938ed2bb/sensors-22-03027-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/2914f2e5a95e/sensors-22-03027-g0A2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/897f33a5491e/sensors-22-03027-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/40fd373c63fc/sensors-22-03027-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/6ebe31e26eb7/sensors-22-03027-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/32973b7bfed0/sensors-22-03027-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/c2a1e22e5789/sensors-22-03027-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/99acc6bb7f7a/sensors-22-03027-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/5d815155cea7/sensors-22-03027-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/f7ebcfc9b083/sensors-22-03027-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/211bfe98d8d0/sensors-22-03027-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/ec25373b1d50/sensors-22-03027-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/8b2d6f36b11e/sensors-22-03027-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/47a5938ed2bb/sensors-22-03027-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/2914f2e5a95e/sensors-22-03027-g0A2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/897f33a5491e/sensors-22-03027-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/40fd373c63fc/sensors-22-03027-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/6ebe31e26eb7/sensors-22-03027-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/32973b7bfed0/sensors-22-03027-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/c2a1e22e5789/sensors-22-03027-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/99acc6bb7f7a/sensors-22-03027-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/5d815155cea7/sensors-22-03027-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/f7ebcfc9b083/sensors-22-03027-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/211bfe98d8d0/sensors-22-03027-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/ec25373b1d50/sensors-22-03027-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/460e/9027119/8b2d6f36b11e/sensors-22-03027-g011.jpg

相似文献

[1]
Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy.

Sensors (Basel). 2022-4-15

[2]
The development of an automatic speech recognition model using interview data from long-term care for older adults.

J Am Med Inform Assoc. 2023-2-16

[3]
Two-Step Joint Optimization with Auxiliary Loss Function for Noise-Robust Speech Recognition.

Sensors (Basel). 2022-7-19

[4]
Combining automatic speech recognition with semantic natural language processing in schizophrenia.

Psychiatry Res. 2023-7

[5]
Automatic Speech Recognition in Primary Progressive Apraxia of Speech.

J Speech Lang Hear Res. 2024-9-12

[6]
The use of speech recognition technology by people living with amyotrophic lateral sclerosis: a scoping review.

Disabil Rehabil Assist Technol. 2023-10

[7]
Retrospective Analysis of Clinical Performance of an Estonian Speech Recognition System for Radiology: Effects of Different Acoustic and Language Models.

J Digit Imaging. 2018-10

[8]
The benefit obtained from visually displayed text from an automatic speech recognizer during listening to speech presented in noise.

Ear Hear. 2008-12

[9]
Interaction between people with dysarthria and speech recognition systems: A review.

Assist Technol. 2023-7-4

[10]
A proof-of-concept study for automatic speech recognition to transcribe AAC speakers' speech from high-technology AAC systems.

Assist Technol. 2024-7-3

本文引用的文献

[1]
Active Volume Control in Smart Phones Based on User Activity and Ambient Noise.

Sensors (Basel). 2020-7-24

[2]
A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks.

IEEE Trans Neural Syst Rehabil Eng. 2014-9

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索