• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于抗噪声语音识别的基于聚类的成对对比损失

Cluster-Based Pairwise Contrastive Loss for Noise-Robust Speech Recognition.

作者信息

Lee Geon Woo, Kim Hong Kook

机构信息

AI Graduate School, Gwangju Institute of Science and Technology, Gwangju 61005, Republic of Korea.

School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Republic of Korea.

出版信息

Sensors (Basel). 2024 Apr 17;24(8):2573. doi: 10.3390/s24082573.

DOI:10.3390/s24082573
PMID:38676191
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11054889/
Abstract

This paper addresses a joint training approach applied to a pipeline comprising speech enhancement (SE) and automatic speech recognition (ASR) models, where an acoustic tokenizer is included in the pipeline to leverage the linguistic information from the ASR model to the SE model. The acoustic tokenizer takes the outputs of the ASR encoder and provides a pseudo-label through K-means clustering. To transfer the linguistic information, represented by pseudo-labels, from the acoustic tokenizer to the SE model, a cluster-based pairwise contrastive (CBPC) loss function is proposed, which is a self-supervised contrastive loss function, and combined with an information noise contrastive estimation (infoNCE) loss function. This combined loss function prevents the SE model from overfitting to outlier samples and represents the pronunciation variability in samples with the same pseudo-label. The effectiveness of the proposed CBPC loss function is evaluated on a noisy LibriSpeech dataset by measuring both the speech quality scores and the word error rate (WER). The experimental results reveal that the proposed joint training approach using the described CBPC loss function achieves a lower WER than the conventional joint training approaches. In addition, it is demonstrated that the speech quality scores of the SE model trained using the proposed training approach are higher than those of the standalone-SE model and SE models trained using conventional joint training approaches. An ablation study is also conducted to investigate the effects of different combinations of loss functions on the speech quality scores and WER. Here, it is revealed that the proposed CBPC loss function combined with infoNCE contributes to a reduced WER and an increase in most of the speech quality scores.

摘要

本文探讨了一种联合训练方法,该方法应用于一个由语音增强(SE)和自动语音识别(ASR)模型组成的流程,其中流程中包含一个声学分词器,用于将来自ASR模型的语言信息传递到SE模型。声学分词器获取ASR编码器的输出,并通过K均值聚类提供一个伪标签。为了将由伪标签表示的语言信息从声学分词器传递到SE模型,提出了一种基于聚类的成对对比(CBPC)损失函数,它是一种自监督对比损失函数,并与信息噪声对比估计(infoNCE)损失函数相结合。这种组合损失函数可防止SE模型过度拟合异常样本,并表示具有相同伪标签的样本中的发音变异性。通过测量语音质量分数和单词错误率(WER),在有噪声的LibriSpeech数据集上评估了所提出的CBPC损失函数的有效性。实验结果表明,使用所描述的CBPC损失函数的联合训练方法比传统的联合训练方法实现了更低的WER。此外,还证明了使用所提出的训练方法训练的SE模型的语音质量分数高于独立的SE模型以及使用传统联合训练方法训练的SE模型。还进行了一项消融研究,以研究损失函数的不同组合对语音质量分数和WER的影响。在此,结果表明,所提出的CBPC损失函数与infoNCE相结合有助于降低WER并提高大多数语音质量分数。

相似文献

1
Cluster-Based Pairwise Contrastive Loss for Noise-Robust Speech Recognition.用于抗噪声语音识别的基于聚类的成对对比损失
Sensors (Basel). 2024 Apr 17;24(8):2573. doi: 10.3390/s24082573.
2
Two-Step Joint Optimization with Auxiliary Loss Function for Noise-Robust Speech Recognition.两步联合优化与辅助损失函数的噪声鲁棒语音识别。
Sensors (Basel). 2022 Jul 19;22(14):5381. doi: 10.3390/s22145381.
3
Automatic Speech Recognition Performance Improvement for Mandarin Based on Optimizing Gain Control Strategy.基于优化增益控制策略的普通话自动语音识别性能提升
Sensors (Basel). 2022 Apr 15;22(8):3027. doi: 10.3390/s22083027.
4
Complete and Resilient Documentation for Operational Medical Environments Leveraging Mobile Hands-free Technology in a Systems Approach: Experimental Study.在系统方法中利用移动免提技术实现操作性医疗环境的完整和有弹性的文档记录:实验研究。
JMIR Mhealth Uhealth. 2021 Oct 12;9(10):e32301. doi: 10.2196/32301.
5
Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition.基于硬负例采样的对比说话人表示学习在说话人识别中的应用。
Sensors (Basel). 2024 Sep 25;24(19):6213. doi: 10.3390/s24196213.
6
Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation.基于伪标签自训练的局部对比损失的半监督医学图像分割。
Med Image Anal. 2023 Jul;87:102792. doi: 10.1016/j.media.2023.102792. Epub 2023 Mar 11.
7
The development of an automatic speech recognition model using interview data from long-term care for older adults.利用老年人长期护理访谈数据开发自动语音识别模型。
J Am Med Inform Assoc. 2023 Feb 16;30(3):411-417. doi: 10.1093/jamia/ocac241.
8
Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network.基于动态权重损失和注意力编解码器循环神经网络的因果语音增强。
PLoS One. 2023 May 11;18(5):e0285629. doi: 10.1371/journal.pone.0285629. eCollection 2023.
9
Noise-robust speech recognition through auditory feature detection and spike sequence decoding.通过听觉特征检测和尖峰序列解码实现抗噪语音识别。
Neural Comput. 2014 Mar;26(3):523-56. doi: 10.1162/NECO_a_00557. Epub 2013 Dec 9.
10
Multiexpert automatic speech recognition using acoustic and myoelectric signals.使用声学和肌电信号的多专家自动语音识别
IEEE Trans Biomed Eng. 2006 Apr;53(4):676-85. doi: 10.1109/TBME.2006.870224.

本文引用的文献

1
Two-Step Joint Optimization with Auxiliary Loss Function for Noise-Robust Speech Recognition.两步联合优化与辅助损失函数的噪声鲁棒语音识别。
Sensors (Basel). 2022 Jul 19;22(14):5381. doi: 10.3390/s22145381.
2
Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.用于单通道和多通道语音增强及稳健自动语音识别的复杂谱映射
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1778-1787. doi: 10.1109/taslp.2020.2998279. Epub 2020 May 28.
3
Supervised Speech Separation Based on Deep Learning: An Overview.
基于深度学习的监督语音分离:综述
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.