Suppr超能文献

基于级联架构的噪声语音估计与浊音检测

Estimation and Voicing Detection With Cascade Architecture in Noisy Speech.

作者信息

Zhang Yixuan, Wang Heming, Wang DeLiang

机构信息

Department of Computer Science and Engineering, Ohio State University, Columbus, OH 43210 USA.

Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, Ohio State University, Columbus, OH 43210 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2023;31:3760-3770. doi: 10.1109/TASLP.2023.3313427. Epub 2023 Sep 13.

Abstract

As a fundamental problem in speech processing, pitch tracking has been studied for decades. While strong performance has been achieved on clean speech, pitch tracking in noisy speech is still challenging. Severe non-stationary noises not only corrupt the harmonic structure in voiced intervals but also make it difficult to determine the existence of voiced speech. Given the importance of voicing detection for pitch tracking, this study proposes a neural cascade architecture that jointly performs pitch estimation and voicing detection. The cascade architecture optimizes a speech enhancement module and a pitch tracking module, and is trained in a speaker-independent and noise-independent way. It is observed that incorporating the enhancement module improves both pitch estimation and voicing detection accuracy, especially in low signal-to-noise ratio (SNR) conditions. In addition, compared with frameworks that combine corresponding single-task models, the proposed multi-task framework achieves better performance and is more efficient. Experimental results show that the proposed method is robust to different noise conditions and substantially outperforms other competitive pitch tracking methods.

摘要

作为语音处理中的一个基本问题,基音跟踪已经研究了几十年。虽然在纯净语音上已经取得了强大的性能,但噪声语音中的基音跟踪仍然具有挑战性。严重的非平稳噪声不仅会破坏浊音区间的谐波结构,还会使确定浊音语音的存在变得困难。鉴于浊音检测对基音跟踪的重要性,本研究提出了一种联合执行基音估计和浊音检测的神经级联架构。该级联架构优化了一个语音增强模块和一个基音跟踪模块,并以独立于说话者和噪声的方式进行训练。据观察,纳入增强模块可提高基音估计和浊音检测的准确性,尤其是在低信噪比(SNR)条件下。此外,与结合相应单任务模型的框架相比,所提出的多任务框架具有更好的性能且更高效。实验结果表明,所提出的方法对不同噪声条件具有鲁棒性,并且显著优于其他有竞争力的基音跟踪方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/42b6/12048035/eaa90b99026f/nihms-2076299-f0001.jpg

相似文献

1
Estimation and Voicing Detection With Cascade Architecture in Noisy Speech.基于级联架构的噪声语音估计与浊音检测
IEEE/ACM Trans Audio Speech Lang Process. 2023;31:3760-3770. doi: 10.1109/TASLP.2023.3313427. Epub 2023 Sep 13.
4
CROSS-DOMAIN SPEECH ENHANCEMENT WITH A NEURAL CASCADE ARCHITECTURE.基于神经级联架构的跨域语音增强
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:7862-7866. doi: 10.1109/icassp43922.2022.9747752. Epub 2022 Apr 27.
6
Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement.用于语音增强的具有三域损失的神经级联架构
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:734-743. doi: 10.1109/taslp.2021.3138716. Epub 2021 Dec 28.
8
Robust Harmonic Features for Classification-Based Pitch Estimation.用于基于分类的基音估计的稳健谐波特征
IEEE/ACM Trans Audio Speech Lang Process. 2017 May;25(5):952-964. doi: 10.1109/TASLP.2017.2667879. Epub 2017 Feb 13.
9
NEURAL CASCADE ARCHITECTURE FOR JOINT ACOUSTIC ECHO AND NOISE SUPPRESSION.用于联合声学回声和噪声抑制的神经级联架构
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:671-675. doi: 10.1109/icassp43922.2022.9747445. Epub 2022 Apr 27.

本文引用的文献

2
Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement.用于语音增强的具有三域损失的神经级联架构
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:734-743. doi: 10.1109/taslp.2021.3138716. Epub 2021 Dec 28.
3
Deep Learning Based Real-time Speech Enhancement for Dual-microphone Mobile Phones.基于深度学习的双麦克风手机实时语音增强
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1853-1863. doi: 10.1109/taslp.2021.3082318. Epub 2021 May 21.
6
UNet++: A Nested U-Net Architecture for Medical Image Segmentation.U-Net++:一种用于医学图像分割的嵌套U-Net架构。
Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018). 2018 Sep;11045:3-11. doi: 10.1007/978-3-030-00889-5_1. Epub 2018 Sep 20.
7
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验