• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于高斯滤波高频特征训练优化的 BiLSTM 网络的语音伪造分类。

Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification.

机构信息

Electrical Engineering Department, Prince Mohammad bin Fahd University, P.O. Box 1664, Al Khobar 31952, Saudi Arabia.

Department of Computer Engineering, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia.

出版信息

Sensors (Basel). 2023 Jul 24;23(14):6637. doi: 10.3390/s23146637.

DOI:10.3390/s23146637
PMID:37514931
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10386291/
Abstract

Voice-controlled devices are in demand due to their hands-free controls. However, using voice-controlled devices in sensitive scenarios like smartphone applications and financial transactions requires protection against fraudulent attacks referred to as "speech spoofing". The algorithms used in spoof attacks are practically unknown; hence, further analysis and development of spoof-detection models for improving spoof classification are required. A study of the spoofed-speech spectrum suggests that high-frequency features are able to discriminate genuine speech from spoofed speech well. Typically, linear or triangular filter banks are used to obtain high-frequency features. However, a Gaussian filter can extract more global information than a triangular filter. In addition, MFCC features are preferable among other speech features because of their lower covariance. Therefore, in this study, the use of a Gaussian filter is proposed for the extraction of inverted MFCC (iMFCC) features, providing high-frequency features. Complementary features are integrated with iMFCC to strengthen the features that aid in the discrimination of spoof speech. Deep learning has been proven to be efficient in classification applications, but the selection of its hyper-parameters and architecture is crucial and directly affects performance. Therefore, a Bayesian algorithm is used to optimize the BiLSTM network. Thus, in this study, we build a high-frequency-based optimized BiLSTM network to classify the spoofed-speech signal, and we present an extensive investigation using the ASVSpoof 2017 dataset. The optimized BiLSTM model is successfully trained with the least epoch and achieved a 99.58% validation accuracy. The proposed algorithm achieved a 6.58% EER on the evaluation dataset, with a relative improvement of 78% on a baseline spoof-identification system.

摘要

由于其免提控制功能,语音控制设备需求量很大。然而,在智能手机应用程序和金融交易等敏感场景中使用语音控制设备需要防止被称为“语音欺骗”的欺诈攻击。欺骗攻击中使用的算法实际上是未知的;因此,需要进一步分析和开发欺骗检测模型,以提高欺骗分类的准确性。对欺骗语音频谱的研究表明,高频特征能够很好地区分真实语音和欺骗语音。通常,使用线性或三角滤波器组来获取高频特征。然而,与三角滤波器相比,高斯滤波器可以提取更多的全局信息。此外,由于其协方差较低,MFCC 特征比其他语音特征更受欢迎。因此,在本研究中,提出使用高斯滤波器提取倒谱 MFCC(iMFCC)特征,以提供高频特征。互补特征与 iMFCC 相结合,以增强有助于区分欺骗语音的特征。深度学习在分类应用中已被证明是有效的,但选择其超参数和架构至关重要,直接影响性能。因此,使用贝叶斯算法优化 BiLSTM 网络。因此,在本研究中,我们构建了一个基于高频的优化 BiLSTM 网络来对欺骗语音信号进行分类,并使用 ASVSpoof 2017 数据集进行了广泛的研究。该优化的 BiLSTM 模型在最少的 epoch 内成功训练,并在验证集上实现了 99.58%的准确率。所提出的算法在评估数据集上实现了 6.58%的 EER,与基线欺骗识别系统相比,相对提高了 78%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/653981cf6d7a/sensors-23-06637-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/00c4b24264d9/sensors-23-06637-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/20f0ca245d16/sensors-23-06637-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/03fe83817da5/sensors-23-06637-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/1b25a1a8817a/sensors-23-06637-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/e0582e189c68/sensors-23-06637-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/dcf2cbb77455/sensors-23-06637-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/67bedf0b88a3/sensors-23-06637-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/91c5629ebf29/sensors-23-06637-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/c235c73b7050/sensors-23-06637-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/ea8450c54cc4/sensors-23-06637-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/7182f421e02e/sensors-23-06637-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/653981cf6d7a/sensors-23-06637-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/00c4b24264d9/sensors-23-06637-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/20f0ca245d16/sensors-23-06637-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/03fe83817da5/sensors-23-06637-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/1b25a1a8817a/sensors-23-06637-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/e0582e189c68/sensors-23-06637-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/dcf2cbb77455/sensors-23-06637-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/67bedf0b88a3/sensors-23-06637-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/91c5629ebf29/sensors-23-06637-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/c235c73b7050/sensors-23-06637-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/ea8450c54cc4/sensors-23-06637-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/7182f421e02e/sensors-23-06637-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7b1/10386291/653981cf6d7a/sensors-23-06637-g012.jpg

相似文献

1
Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification.基于高斯滤波高频特征训练优化的 BiLSTM 网络的语音伪造分类。
Sensors (Basel). 2023 Jul 24;23(14):6637. doi: 10.3390/s23146637.
2
A blended framework for audio spoof detection with sequential models and bags of auditory bites.一种结合了序列模型和音频片段包的音频伪造检测的混合框架。
Sci Rep. 2024 Aug 30;14(1):20192. doi: 10.1038/s41598-024-71026-w.
3
Voice spoofing detection using a neural networks assembly considering spectrograms and mel frequency cepstral coefficients.使用考虑频谱图和梅尔频率倒谱系数的神经网络组件进行语音欺骗检测。
PeerJ Comput Sci. 2023 Dec 18;9:e1740. doi: 10.7717/peerj-cs.1740. eCollection 2023.
4
Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection.具有组合损失函数的高效注意力分支网络用于自动语音识别欺骗检测
Circuits Syst Signal Process. 2023 Feb 23:1-19. doi: 10.1007/s00034-023-02314-5.
5
Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features.使用深度神经网络分类器和动态声学特征的自动说话人验证系统中的欺骗检测
IEEE Trans Neural Netw Learn Syst. 2018 Oct;29(10):4633-4644. doi: 10.1109/TNNLS.2017.2771947. Epub 2017 Dec 4.
6
IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients.物联网支持的 WBAN 和机器学习在患者语音情感识别中的应用。
Sensors (Basel). 2023 Mar 8;23(6):2948. doi: 10.3390/s23062948.
7
Face anti-spoofing with cross-stage relation enhancement and spoof material perception.跨阶段关系增强与伪造材料感知的人脸防欺骗。
Neural Netw. 2024 Jul;175:106275. doi: 10.1016/j.neunet.2024.106275. Epub 2024 Mar 27.
8
Bi-FPNFAS: Bi-Directional Feature Pyramid Network for Pixel-Wise Face Anti-Spoofing by Leveraging Fourier Spectra.Bi-FPNFAS:基于傅里叶频谱利用双向特征金字塔网络进行像素级人脸反欺骗。
Sensors (Basel). 2021 Apr 15;21(8):2799. doi: 10.3390/s21082799.
9
Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method.基于多模态融合方法的语音信号和 EEG 信号的语音病理学检测与分类。
Biomed Tech (Berl). 2021 Nov 29;66(6):613-625. doi: 10.1515/bmt-2021-0112. Print 2021 Dec 20.
10
Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations.用于婴儿和母亲发声分类的声学和语音质量特征分析。
Speech Commun. 2021 Oct;133:41-61. doi: 10.1016/j.specom.2021.07.010. Epub 2021 Aug 18.

本文引用的文献

1
A Spiking Neural Network With Adaptive Graph Convolution and LSTM for EEG-Based Brain-Computer Interfaces.一种用于基于脑电图的脑机接口的具有自适应图卷积和长短期记忆网络的脉冲神经网络。
IEEE Trans Neural Syst Rehabil Eng. 2023;31:1440-1450. doi: 10.1109/TNSRE.2023.3246989. Epub 2023 Feb 28.
2
Deep neural architectures for dialect classification with single frequency filtering and zero-time windowing feature representations.用于方言分类的深度神经架构,具有单频滤波和零时间窗特征表示。
J Acoust Soc Am. 2022 Feb;151(2):1077. doi: 10.1121/10.0009405.