• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于深度神经网络的语音通信非侵入式语音质量评估

Non-Intrusive Speech Quality Assessment Based on Deep Neural Networks for Speech Communication.

作者信息

Liu Miao, Wang Jing, Wang Fei, Xiang Fei, Chen Jingdong

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):174-187. doi: 10.1109/TNNLS.2023.3321076. Epub 2025 Jan 7.

DOI:10.1109/TNNLS.2023.3321076
PMID:37824322
Abstract

Traditionally, speech quality evaluation relies on subjective assessments or intrusive methods that require reference signals or additional equipment. However, over recent years, non-intrusive speech quality assessment has emerged as a promising alternative, capturing much attention from researchers and industry professionals. This article presents a deep learning-based method that exploits large-scale intrusive simulated data to improve the accuracy and generalization of non-intrusive methods. The major contributions of this article are as follows. First, it presents a data simulation method, which generates degraded speech signals and labels their speech quality with the perceptual objective listening quality assessment (POLQA). The generated data is proven to be useful for pretraining the deep learning models. Second, it proposes to apply an adversarial speaker classifier to reduce the impact of speaker-dependent information on speech quality evaluation. Third, an autoencoder-based deep learning scheme is proposed following the principle of representation learning and adversarial training (AT) methods, which is able to transfer the knowledge learned from a large amount of simulated speech data labeled by POLQA. With the help of discriminative representations extracted from the autoencoder, the prediction model can be trained well on a relatively small amount of speech data labeled through subjective listening tests. Fourth, an end-to-end speech quality evaluation neural network is developed, which takes magnitude and phase spectral features as its inputs. This phase-aware model is more accurate than the model using only the magnitude spectral features. A large number of experiments are carried out with three datasets: one simulated with labels obtained using POLQA and two recorded with labels obtained using subjective listening tests. The results show that the presented phase-aware method improves the performance of the baseline model and the proposed model with latent representations extracted from the adversarial autoencoder (AAE) outperforms the state-of-the-art objective quality assessment methods, reducing the root mean square error (RMSE) by 10.5% and 12.2% on the Beijing Institute of Technology (BIT) dataset and Tencent Corpus, respectively. The code and supplementary materials are available at https://github.com/liushenme/AAE-SQA.

摘要

传统上,语音质量评估依赖于主观评估或需要参考信号或额外设备的侵入式方法。然而,近年来,非侵入式语音质量评估已成为一种有前途的替代方法,受到了研究人员和行业专业人士的广泛关注。本文提出了一种基于深度学习的方法,该方法利用大规模侵入式模拟数据来提高非侵入式方法的准确性和泛化能力。本文的主要贡献如下。首先,提出了一种数据模拟方法,该方法生成降级语音信号,并使用感知客观听力质量评估(POLQA)对其语音质量进行标注。实验证明,生成的数据对深度学习模型的预训练很有用。其次,提出应用对抗性说话人分类器来减少说话人相关信息对语音质量评估的影响。第三,遵循表征学习和对抗训练(AT)方法的原理,提出了一种基于自动编码器的深度学习方案,该方案能够转移从大量由POLQA标注的模拟语音数据中学到的知识。借助从自动编码器中提取的判别性表征,预测模型可以在通过主观听力测试标注的相对少量语音数据上得到良好训练。第四,开发了一种端到端语音质量评估神经网络,该网络以幅度和相位谱特征作为输入。这种相位感知模型比仅使用幅度谱特征的模型更准确。使用三个数据集进行了大量实验:一个是使用POLQA获得的标签进行模拟的,另外两个是使用主观听力测试获得的标签进行记录的。结果表明,所提出的相位感知方法提高了基线模型的性能,并且从对抗自动编码器(AAE)中提取潜在表征的所提模型优于当前最先进的客观质量评估方法,在北京理工大学(BIT)数据集和腾讯语料库上分别将均方根误差(RMSE)降低了10.5%和12.2%。代码和补充材料可在https://github.com/liushenme/AAE-SQA获取。

相似文献

1
Non-Intrusive Speech Quality Assessment Based on Deep Neural Networks for Speech Communication.基于深度神经网络的语音通信非侵入式语音质量评估
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):174-187. doi: 10.1109/TNNLS.2023.3321076. Epub 2025 Jan 7.
2
NISQE: Non-Intrusive Speech Quality Evaluator Based on Natural Statistics of Mean Subtracted Contrast Normalized Coefficients of Spectrogram.基于平均相减对比归一化系数谱图自然统计的非侵入式语音质量评估器(NISQE)
Sensors (Basel). 2023 Jun 16;23(12):5652. doi: 10.3390/s23125652.
3
Speech quality estimation with deep lattice networks.基于深度格网的语音质量评估。
J Acoust Soc Am. 2021 Jun;149(6):3851. doi: 10.1121/10.0005130.
4
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.
5
Improving Speech Emotion Recognition With Adversarial Data Augmentation Network.利用对抗性数据增强网络提高语音情感识别能力。
IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):172-184. doi: 10.1109/TNNLS.2020.3027600. Epub 2022 Jan 5.
6
SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network.SASEGAN-TCN:基于自注意力生成对抗网络和时间卷积网络的语音增强算法。
Math Biosci Eng. 2024 Feb 21;21(3):3860-3875. doi: 10.3934/mbe.2024172.
7
CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks.CiwGAN 和 fiwGAN:利用生成对抗网络将声学数据中的信息编码,以建模词汇学习。
Neural Netw. 2021 Jul;139:305-325. doi: 10.1016/j.neunet.2021.03.017. Epub 2021 Mar 19.
8
Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG).利用对抗性判别域泛化(ADDoG)改进跨语料库语音情感识别
IEEE Trans Affect Comput. 2021 Oct-Dec;12(4):1055-1068. doi: 10.1109/taffc.2019.2916092. Epub 2019 May 14.
9
A medical image classification method based on self-regularized adversarial learning.基于自正则化对抗学习的医学图像分类方法。
Med Phys. 2024 Nov;51(11):8232-8246. doi: 10.1002/mp.17320. Epub 2024 Jul 30.
10
A multimodal dynamical variational autoencoder for audiovisual speech representation learning.一种用于视听语音表示学习的多模态动态变分自编码器。
Neural Netw. 2024 Apr;172:106120. doi: 10.1016/j.neunet.2024.106120. Epub 2024 Jan 11.