• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于元学习的少样本短语音说话人验证

Few-shot short utterance speaker verification using meta-learning.

作者信息

Wang Weijie, Zhao Hong, Yang Yikun, Chang YouKang, You Haojie

机构信息

School of Computer and Communication, Lanzhou University of Technology, Lanzhou, China.

School of Information Science & Engineering, Lanzhou University, Lanzhou, China.

出版信息

PeerJ Comput Sci. 2023 Apr 21;9:e1276. doi: 10.7717/peerj-cs.1276. eCollection 2023.

DOI:10.7717/peerj-cs.1276
PMID:37346533
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10280689/
Abstract

Short utterance speaker verification (SV) in the actual application is the task of accepting or rejecting the identity claim of a speaker based on a few enrollment utterances. Traditional methods have used deep neural networks to extract speaker representations for verification. Recently, several meta-learning approaches have learned a deep distance metric to distinguish speakers within meta-tasks. Among them, a prototypical network learns a metric space that may be used to compute the distance to the prototype center of speakers, in order to classify speaker identity. We use emphasized channel attention, propagation and aggregation in TDNN (ECAPA-TDNN) to implement the necessary function for the prototypical network, which is a nonlinear mapping from the input space to the metric space for either few-shot SV task. In addition, optimizing only for speakers in given meta-tasks cannot be sufficient to learn distinctive speaker features. Thus, we used an episodic training strategy, in which the classes of the support and query sets correspond to the classes of the entire training set, further improving the model performance. The proposed model outperforms comparison models on the VoxCeleb1 dataset and has a wide range of practical applications.

摘要

实际应用中的短语音说话人验证(SV)任务是根据少数注册语音来接受或拒绝说话人的身份声明。传统方法使用深度神经网络提取说话人特征用于验证。最近,一些元学习方法学习了深度距离度量来在元任务中区分说话人。其中,原型网络学习一个度量空间,该空间可用于计算到说话人原型中心的距离,以便对说话人身份进行分类。我们在时延深度神经网络(TDNN)中使用增强通道注意力、传播和聚合(ECAPA-TDNN)来实现原型网络所需的功能,这是一个针对少样本SV任务从输入空间到度量空间的非线性映射。此外,仅针对给定元任务中的说话人进行优化不足以学习到独特的说话人特征。因此,我们采用了一种情节训练策略,其中支持集和查询集的类别与整个训练集的类别相对应,进一步提高了模型性能。所提出的模型在VoxCeleb1数据集上优于对比模型,具有广泛的实际应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c222/10280689/fec6f011bac1/peerj-cs-09-1276-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c222/10280689/de30a723ae1d/peerj-cs-09-1276-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c222/10280689/3f7296f2b228/peerj-cs-09-1276-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c222/10280689/ed285ca2c2c5/peerj-cs-09-1276-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c222/10280689/fec6f011bac1/peerj-cs-09-1276-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c222/10280689/de30a723ae1d/peerj-cs-09-1276-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c222/10280689/3f7296f2b228/peerj-cs-09-1276-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c222/10280689/ed285ca2c2c5/peerj-cs-09-1276-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c222/10280689/fec6f011bac1/peerj-cs-09-1276-g004.jpg

相似文献

1
Few-shot short utterance speaker verification using meta-learning.基于元学习的少样本短语音说话人验证
PeerJ Comput Sci. 2023 Apr 21;9:e1276. doi: 10.7717/peerj-cs.1276. eCollection 2023.
2
Bidirectional Attention for Text-Dependent Speaker Verification.基于双向注意力的文本相关说话人验证。
Sensors (Basel). 2020 Nov 27;20(23):6784. doi: 10.3390/s20236784.
3
Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings.使用深度学习嵌入进行自动法医语音比对中的语言不匹配的影响。
J Forensic Sci. 2023 May;68(3):871-883. doi: 10.1111/1556-4029.15250. Epub 2023 Mar 31.
4
H-VECTORS: Improving the robustness in utterance-level speaker embeddings using a hierarchical attention model.H-VECTORS:使用分层注意力模型提高语句级说话人嵌入的鲁棒性。
Neural Netw. 2021 Oct;142:329-339. doi: 10.1016/j.neunet.2021.05.024. Epub 2021 May 25.
5
Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition.基于硬负例采样的对比说话人表示学习在说话人识别中的应用。
Sensors (Basel). 2024 Sep 25;24(19):6213. doi: 10.3390/s24196213.
6
Partially supervised speaker clustering.部分监督的说话人聚类。
IEEE Trans Pattern Anal Mach Intell. 2012 May;34(5):959-71. doi: 10.1109/TPAMI.2011.174.
7
Attention-Based Temporal-Frequency Aggregation for Speaker Verification.基于注意力的时频聚合在说话人验证中的应用。
Sensors (Basel). 2022 Mar 10;22(6):2147. doi: 10.3390/s22062147.
8
Meta-Prototypical Learning for Domain-Agnostic Few-Shot Recognition.用于领域无关少样本识别的元原型学习
IEEE Trans Neural Netw Learn Syst. 2022 Nov;33(11):6990-6996. doi: 10.1109/TNNLS.2021.3083650. Epub 2022 Oct 27.
9
Deep neural architectures for dialect classification with single frequency filtering and zero-time windowing feature representations.用于方言分类的深度神经架构,具有单频滤波和零时间窗特征表示。
J Acoust Soc Am. 2022 Feb;151(2):1077. doi: 10.1121/10.0009405.
10
Multisensory Fusion for Unsupervised Spatiotemporal Speaker Diarization.用于无监督时空说话人分离的多感官融合
Sensors (Basel). 2024 Jun 29;24(13):4229. doi: 10.3390/s24134229.

引用本文的文献

1
ClinClip: a Multimodal Language Pre-training model integrating EEG data for enhanced English medical listening assessment.ClinClip:一种整合脑电图数据的多模态语言预训练模型,用于增强英语医学听力评估。
Front Neurosci. 2025 Jan 7;18:1493163. doi: 10.3389/fnins.2024.1493163. eCollection 2024.

本文引用的文献

1
Meta-Learning in Neural Networks: A Survey.元学习在神经网络中的研究进展综述
IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5149-5169. doi: 10.1109/TPAMI.2021.3079209. Epub 2022 Aug 4.
2
Speaker recognition based on deep learning: An overview.基于深度学习的说话人识别:综述。
Neural Netw. 2021 Aug;140:65-99. doi: 10.1016/j.neunet.2021.03.004. Epub 2021 Mar 17.
3
Res2Net: A New Multi-Scale Backbone Architecture.Res2Net:一种新的多尺度骨干网络架构。
IEEE Trans Pattern Anal Mach Intell. 2021 Feb;43(2):652-662. doi: 10.1109/TPAMI.2019.2938758. Epub 2021 Jan 8.
4
DCSR: Dilated Convolutions for Single Image Super-Resolution.DCSR:用于单图像超分辨率的扩张卷积。
IEEE Trans Image Process. 2019 Apr;28(4):1625-1635. doi: 10.1109/TIP.2018.2877483. Epub 2018 Oct 22.