• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于双向注意力的文本相关说话人验证。

Bidirectional Attention for Text-Dependent Speaker Verification.

机构信息

School of Information Science and Technology, University of Science and Technology of China, Hefei 230022, China.

iFLYTEK Research, iFLYTEK Co., Ltd., Hefei 230088, China.

出版信息

Sensors (Basel). 2020 Nov 27;20(23):6784. doi: 10.3390/s20236784.

DOI:10.3390/s20236784
PMID:33261046
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7730222/
Abstract

Automatic speaker verification provides a flexible and effective way for biometric authentication. Previous deep learning-based methods have demonstrated promising results, whereas a few problems still require better solutions. In prior works examining speaker discriminative neural networks, the speaker representation of the target speaker is regarded as a fixed one when comparing with utterances from different speakers, and the joint information between enrollment and evaluation utterances is ignored. In this paper, we propose to combine CNN-based feature learning with a bidirectional attention mechanism to achieve better performance with only one enrollment utterance. The evaluation-enrollment joint information is exploited to provide interactive features through bidirectional attention. In addition, we introduce one individual cost function to identify the phonetic contents, which contributes to calculating the attention score more specifically. These interactive features are complementary to the constant ones, which are extracted from individual speakers separately and do not vary with the evaluation utterances. The proposed method archived a competitive equal error rate of 6.26% on the internal "DAN DAN NI HAO" benchmark dataset with 1250 utterances and outperformed various baseline methods, including the traditional i-vector/PLDA, d-vector, self-attention, and sequence-to-sequence attention models.

摘要

自动说话人验证为生物特征认证提供了一种灵活有效的方法。基于深度学习的方法已经取得了很有前景的成果,但仍有一些问题需要更好的解决方案。在先前研究说话人判别神经网络的工作中,当与来自不同说话人的话语进行比较时,目标说话人的说话人表示被视为固定的,并且忽略了注册和评估话语之间的联合信息。在本文中,我们提出结合基于 CNN 的特征学习和双向注意力机制,仅使用一个注册话语即可实现更好的性能。通过双向注意力利用评估-注册联合信息来提供交互特征。此外,我们引入了一个个体成本函数来识别语音内容,这有助于更具体地计算注意力得分。这些交互特征与从个体说话人分别提取的且不随评估话语变化的常数特征互补。在内部“DAN DAN NI HAO”基准数据集上,该方法在 1250 个话语上实现了具有竞争力的 6.26%的等错误率,优于各种基线方法,包括传统的 i-vector/PLDA、d-vector、自注意力和序列到序列注意力模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/c30b7258bbeb/sensors-20-06784-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/2915e11c0ba2/sensors-20-06784-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/3c3b558f56bd/sensors-20-06784-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/db5a82b5d29c/sensors-20-06784-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/fa6a250f2518/sensors-20-06784-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/03223f52a65e/sensors-20-06784-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/2def4beb93a8/sensors-20-06784-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/c30b7258bbeb/sensors-20-06784-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/2915e11c0ba2/sensors-20-06784-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/3c3b558f56bd/sensors-20-06784-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/db5a82b5d29c/sensors-20-06784-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/fa6a250f2518/sensors-20-06784-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/03223f52a65e/sensors-20-06784-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/2def4beb93a8/sensors-20-06784-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b82/7730222/c30b7258bbeb/sensors-20-06784-g007.jpg

相似文献

1
Bidirectional Attention for Text-Dependent Speaker Verification.基于双向注意力的文本相关说话人验证。
Sensors (Basel). 2020 Nov 27;20(23):6784. doi: 10.3390/s20236784.
2
Attention-Based Temporal-Frequency Aggregation for Speaker Verification.基于注意力的时频聚合在说话人验证中的应用。
Sensors (Basel). 2022 Mar 10;22(6):2147. doi: 10.3390/s22062147.
3
Phonetic variability constrained bottleneck features for joint speaker recognition and physical task stress detection.用于联合说话人识别和身体任务压力检测的语音变异受限瓶颈特征
J Acoust Soc Am. 2020 Nov;148(5):2912. doi: 10.1121/10.0002455.
4
Few-shot short utterance speaker verification using meta-learning.基于元学习的少样本短语音说话人验证
PeerJ Comput Sci. 2023 Apr 21;9:e1276. doi: 10.7717/peerj-cs.1276. eCollection 2023.
5
Learning speaker-specific characteristics with a deep neural architecture.利用深度神经架构学习特定说话者的特征。
IEEE Trans Neural Netw. 2011 Nov;22(11):1744-56. doi: 10.1109/TNN.2011.2167240. Epub 2011 Sep 26.
6
Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition.基于硬负例采样的对比说话人表示学习在说话人识别中的应用。
Sensors (Basel). 2024 Sep 25;24(19):6213. doi: 10.3390/s24196213.
7
H-VECTORS: Improving the robustness in utterance-level speaker embeddings using a hierarchical attention model.H-VECTORS:使用分层注意力模型提高语句级说话人嵌入的鲁棒性。
Neural Netw. 2021 Oct;142:329-339. doi: 10.1016/j.neunet.2021.05.024. Epub 2021 May 25.
8
Audio-Visual Fusion Based on Interactive Attention for Person Verification.基于交互注意力的视听融合的人像验证。
Sensors (Basel). 2023 Dec 15;23(24):9845. doi: 10.3390/s23249845.
9
Minimum classification error-based weighted support vector machine kernels for speaker verification.基于最小分类错误的加权支持向量机核的说话人确认。
J Acoust Soc Am. 2013 Apr;133(4):EL307-13. doi: 10.1121/1.4794350.
10
Partially supervised speaker clustering.部分监督的说话人聚类。
IEEE Trans Pattern Anal Mach Intell. 2012 May;34(5):959-71. doi: 10.1109/TPAMI.2011.174.

引用本文的文献

1
Attention-Based Temporal-Frequency Aggregation for Speaker Verification.基于注意力的时频聚合在说话人验证中的应用。
Sensors (Basel). 2022 Mar 10;22(6):2147. doi: 10.3390/s22062147.

本文引用的文献

1
Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings.基于随机数字串的说话人识别中对抗学习的总变分嵌入。
Sensors (Basel). 2019 Oct 30;19(21):4709. doi: 10.3390/s19214709.
2
Forensic Speaker Verification Using Ordinary Least Squares.基于最小二乘法的法庭语音验证
Sensors (Basel). 2019 Oct 10;19(20):4385. doi: 10.3390/s19204385.
3
Sensorineural hearing loss degrades behavioral and physiological measures of human spatial selective auditory attention.感音神经性听力损失会降低人类空间选择性听觉注意的行为和生理测量。
Proc Natl Acad Sci U S A. 2018 Apr 3;115(14):E3286-E3295. doi: 10.1073/pnas.1721226115. Epub 2018 Mar 19.
4
Modelling auditory attention.模拟听觉注意力。
Philos Trans R Soc Lond B Biol Sci. 2017 Feb 19;372(1714). doi: 10.1098/rstb.2016.0101. Epub 2017 Jan 2.