具有组合损失函数的高效注意力分支网络用于自动语音识别欺骗检测

Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection.

作者信息

Rostami Amir Mohammad, Homayounpour Mohammad Mehdi, Nickabadi Ahmad

机构信息

Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran.

出版信息

Circuits Syst Signal Process. 2023 Feb 23:1-19. doi: 10.1007/s00034-023-02314-5.

DOI:10.1007/s00034-023-02314-5

PMID:36852137

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9947936/

Abstract

Many endeavors have sought to develop countermeasure techniques as enhancements on Automatic Speaker Verification (ASV) systems, in order to make them more robust against spoof attacks. As evidenced by the latest ASVspoof 2019 countermeasure challenge, models currently deployed for the task of ASV are, at their best, devoid of suitable degrees of generalization to unseen attacks. A joint improvement of components of ASV spoof detection systems including the classifier, feature extraction phase, and model loss function may lead to a better detection of attacks by these systems. Accordingly, the present study proposes the Efficient Attention Branch Network (EABN) architecture with a combined loss function to address the model generalization to unseen attacks. The EABN is based on attention and perception branches. The attention branch provides an attention mask that improves the classification performance and at the same time is interpretable from a human point of view. The perception branch, is used for our main purpose which is spoof detection. The new EfficientNet-A0 architecture was optimized and employed for the perception branch, with nearly ten times fewer parameters and approximately seven times fewer floating-point operations than the SE-Res2Net50 as the best existing network. The proposed method on ASVspoof 2019 dataset achieved EER = 0.86% and t-DCF = 0.0239 in the Physical Access (PA) scenario using the logPowSpec as the input feature extraction method. Furthermore, using the LFCC feature, and the SE-Res2Net50 for the perception branch, the proposed model achieved EER = 1.89% and t-DCF = 0.507 in the Logical Access (LA) scenario, which to the best of our knowledge, is the best single system ASV spoofing countermeasure method.

摘要

许多研究致力于开发对抗技术，以增强自动说话人验证（ASV）系统，使其更能抵御欺骗攻击。正如最新的ASVspoof 2019对抗挑战赛所证明的那样，目前用于ASV任务的模型，充其量也缺乏对未知攻击的适当泛化能力。联合改进ASV欺骗检测系统的组件，包括分类器、特征提取阶段和模型损失函数，可能会使这些系统更好地检测攻击。因此，本研究提出了一种具有组合损失函数的高效注意力分支网络（EABN）架构，以解决模型对未知攻击的泛化问题。EABN基于注意力分支和感知分支。注意力分支提供一个注意力掩码，可提高分类性能，同时从人类角度来看是可解释的。感知分支用于我们的主要目的，即欺骗检测。新的EfficientNet-A0架构被优化并用于感知分支，其参数比现有的最佳网络SE-Res2Net50少近十倍，浮点运算次数约少七倍。在ASVspoof 2019数据集上，所提出的方法在物理访问（PA）场景中使用logPowSpec作为输入特征提取方法时，实现了EER = 0.86%和t-DCF = 0.0239。此外，在逻辑访问（LA）场景中，使用LFCC特征，并将SE-Res2Net50用于感知分支，所提出的模型实现了EER = 1.89%和t-DCF = 0.507，据我们所知，这是最佳的单系统ASV欺骗对抗方法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

具有组合损失函数的高效注意力分支网络用于自动语音识别欺骗检测

Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection.

作者信息

机构信息

出版信息

相似文献

具有组合损失函数的高效注意力分支网络用于自动语音识别欺骗检测

Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection.

作者信息

机构信息

出版信息

相似文献