• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 F0 子带的空间重建局部注意力 Res2Net 用于虚假语音检测。

Spatial reconstructed local attention Res2Net with F0 subband for fake speech detection.

机构信息

Anhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, 230601, China.

Anhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, 230601, China.

出版信息

Neural Netw. 2024 Jul;175:106320. doi: 10.1016/j.neunet.2024.106320. Epub 2024 Apr 16.

DOI:10.1016/j.neunet.2024.106320
PMID:38640696
Abstract

The rhythm of bonafide speech is often difficult to replicate, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so as to improve the performance of FSD, the spatial reconstructed local attention Res2Net (SR-LA Res2Net) is proposed. Specifically, Res2Net is used as a backbone network to obtain multiscale information, and enhanced with a spatial reconstruction mechanism to avoid losing important information when the channel group is constantly superimposed. In addition, local attention is designed to make the model focus on the local information of the F0 subband. Experimental results on the ASVspoof 2019 LA dataset show that our proposed method obtains an equal error rate (EER) of 0.47% and a minimum tandem detection cost function (min t-DCF) of 0.0159, achieving the state-of-the-art performance among all of the single systems.

摘要

真实语音的节奏往往难以复制,这导致合成语音的基频 (F0) 与真实语音有明显的不同。预计 F0 特征包含用于伪造语音检测 (FSD) 任务的鉴别信息。在本文中,我们提出了一种用于 FSD 的新的 F0 子带。此外,为了有效地对 F0 子带进行建模,以提高 FSD 的性能,提出了空间重构局部注意 Res2Net (SR-LA Res2Net)。具体来说,Res2Net 被用作骨干网络来获取多尺度信息,并增强了空间重构机制,以避免在通道组不断叠加时丢失重要信息。此外,设计了局部注意机制,使模型能够关注 F0 子带的局部信息。在 ASVspoof 2019 LA 数据集上的实验结果表明,我们提出的方法在等错误率 (EER) 上达到 0.47%,最小串联检测代价函数 (min t-DCF) 达到 0.0159,在所有单系统中达到了最先进的性能。

相似文献

1
Spatial reconstructed local attention Res2Net with F0 subband for fake speech detection.基于 F0 子带的空间重建局部注意力 Res2Net 用于虚假语音检测。
Neural Netw. 2024 Jul;175:106320. doi: 10.1016/j.neunet.2024.106320. Epub 2024 Apr 16.
2
Res2Net-based multi-scale and multi-attention model for traffic scene image classification.基于 Res2Net 的交通场景图像分类的多尺度和多注意力模型。
PLoS One. 2024 May 20;19(5):e0300017. doi: 10.1371/journal.pone.0300017. eCollection 2024.
3
Performance analysis of various fundamental frequency estimation algorithms in the context of pathological speech.各种基频估计算法在病理语音中的性能分析。
J Acoust Soc Am. 2022 Nov;152(5):3091. doi: 10.1121/10.0015143.
4
Res2Net: A New Multi-Scale Backbone Architecture.Res2Net:一种新的多尺度骨干网络架构。
IEEE Trans Pattern Anal Mach Intell. 2021 Feb;43(2):652-662. doi: 10.1109/TPAMI.2019.2938758. Epub 2021 Jan 8.
5
MAG-Res2Net: a novel deep learning network for human activity recognition.MAG-Res2Net:一种用于人体活动识别的新型深度学习网络。
Physiol Meas. 2023 Nov 28;44(11). doi: 10.1088/1361-6579/ad0ab8.
6
Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification.基于高斯滤波高频特征训练优化的 BiLSTM 网络的语音伪造分类。
Sensors (Basel). 2023 Jul 24;23(14):6637. doi: 10.3390/s23146637.
7
Multiscale self-calibrated pulmonary nodule detection network fusing dual attention mechanism.融合双注意力机制的多尺度自校准肺结节检测网络
Phys Med Biol. 2023 Aug 3;68(16). doi: 10.1088/1361-6560/ace7ab.
8
[A multiscale feature extraction algorithm for dysarthric speech recognition].[一种用于构音障碍语音识别的多尺度特征提取算法]
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2023 Feb 25;40(1):44-50. doi: 10.7507/1001-5515.202205049.
9
A blended framework for audio spoof detection with sequential models and bags of auditory bites.一种结合了序列模型和音频片段包的音频伪造检测的混合框架。
Sci Rep. 2024 Aug 30;14(1):20192. doi: 10.1038/s41598-024-71026-w.
10
On normalized MSE analysis of speech fundamental frequency in the cochlear implant-like spectrally reduced speech.正常化均方误差分析在类耳蜗植入频谱减缩语音中的语音基频。
IEEE Trans Biomed Eng. 2010 Mar;57(3):572-7. doi: 10.1109/TBME.2009.2031097. Epub 2009 Sep 9.

引用本文的文献

1
Application of Hyperspectral Imaging and Multi-Module Joint Hierarchical Residual Network in Seed Cotton Foreign Fiber Recognition.高光谱成像与多模块联合分层残差网络在籽棉异性纤维识别中的应用
Sensors (Basel). 2024 Sep 11;24(18):5892. doi: 10.3390/s24185892.