• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于硬负例采样的对比说话人表示学习在说话人识别中的应用。

Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition.

机构信息

Department of Computer Engineering, Chosun University, Gwangju 61452, Republic of Korea.

Intelligent Image Processing Research Center, Korea Electronics Technology Institute, Seongnam 13509, Republic of Korea.

出版信息

Sensors (Basel). 2024 Sep 25;24(19):6213. doi: 10.3390/s24196213.

DOI:10.3390/s24196213
PMID:39409253
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11478696/
Abstract

Speaker recognition is a technology that identifies the speaker in an input utterance by extracting speaker-distinguishable features from the speech signal. Speaker recognition is used for system security and authentication; therefore, it is crucial to extract unique features of the speaker to achieve high recognition rates. Representative methods for extracting these features include a classification approach, or utilizing contrastive learning to learn the speaker relationship between representations and then using embeddings extracted from a specific layer of the model. This paper introduces a framework for developing robust speaker recognition models through contrastive learning. This approach aims to minimize the similarity to hard negative samples-those that are genuine negatives, but have extremely similar features to the positives, leading to potential mistaken. Specifically, our proposed method trains the model by estimating hard negative samples within a mini-batch during contrastive learning, and then utilizes a cross-attention mechanism to determine speaker agreement for pairs of utterances. To demonstrate the effectiveness of our proposed method, we compared the performance of a deep learning model trained with a conventional loss function utilized in speaker recognition with that of a deep learning model trained using our proposed method, as measured by the equal error rate (EER), an objective performance metric. Our results indicate that when trained with the voxceleb2 dataset, the proposed method achieved an EER of 0.98% on the voxceleb1-E dataset and 1.84% on the voxceleb1-H dataset.

摘要

说话人识别是一种通过从语音信号中提取说话人可区分的特征来识别输入话语中的说话人的技术。说话人识别用于系统安全和认证;因此,提取说话人的独特特征对于实现高识别率至关重要。提取这些特征的代表性方法包括分类方法,或利用对比学习来学习表示之间的说话人关系,然后使用从模型的特定层提取的嵌入。本文介绍了一种通过对比学习开发鲁棒说话人识别模型的框架。该方法旨在最小化与硬负样本的相似性-那些是真正的负样本,但与正样本具有极其相似的特征,从而导致潜在的错误识别。具体来说,我们的方法通过在对比学习过程中估计小批量中的硬负样本来训练模型,然后利用交叉注意机制来确定对语音的同意。为了展示我们提出的方法的有效性,我们将使用传统损失函数训练的深度学习模型的性能与使用我们提出的方法训练的深度学习模型的性能进行了比较,通过误识率(EER)来衡量,这是一个客观的性能指标。我们的结果表明,当使用 voxceleb2 数据集进行训练时,所提出的方法在 voxceleb1-E 数据集上的 EER 为 0.98%,在 voxceleb1-H 数据集上的 EER 为 1.84%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d296/11478696/828dfa452e0b/sensors-24-06213-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d296/11478696/e2c0cc7798ba/sensors-24-06213-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d296/11478696/b2bc50d9ecf9/sensors-24-06213-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d296/11478696/13f9df06dd0d/sensors-24-06213-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d296/11478696/828dfa452e0b/sensors-24-06213-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d296/11478696/e2c0cc7798ba/sensors-24-06213-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d296/11478696/b2bc50d9ecf9/sensors-24-06213-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d296/11478696/13f9df06dd0d/sensors-24-06213-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d296/11478696/828dfa452e0b/sensors-24-06213-g004.jpg

相似文献

1
Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition.基于硬负例采样的对比说话人表示学习在说话人识别中的应用。
Sensors (Basel). 2024 Sep 25;24(19):6213. doi: 10.3390/s24196213.
2
H-VECTORS: Improving the robustness in utterance-level speaker embeddings using a hierarchical attention model.H-VECTORS:使用分层注意力模型提高语句级说话人嵌入的鲁棒性。
Neural Netw. 2021 Oct;142:329-339. doi: 10.1016/j.neunet.2021.05.024. Epub 2021 May 25.
3
Learning speaker-specific characteristics with a deep neural architecture.利用深度神经架构学习特定说话者的特征。
IEEE Trans Neural Netw. 2011 Nov;22(11):1744-56. doi: 10.1109/TNN.2011.2167240. Epub 2011 Sep 26.
4
Speaker recognition based on deep learning: An overview.基于深度学习的说话人识别:综述。
Neural Netw. 2021 Aug;140:65-99. doi: 10.1016/j.neunet.2021.03.004. Epub 2021 Mar 17.
5
Phonetic variability constrained bottleneck features for joint speaker recognition and physical task stress detection.用于联合说话人识别和身体任务压力检测的语音变异受限瓶颈特征
J Acoust Soc Am. 2020 Nov;148(5):2912. doi: 10.1121/10.0002455.
6
Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。
Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.
7
Partially supervised speaker clustering.部分监督的说话人聚类。
IEEE Trans Pattern Anal Mach Intell. 2012 May;34(5):959-71. doi: 10.1109/TPAMI.2011.174.
8
Few-shot short utterance speaker verification using meta-learning.基于元学习的少样本短语音说话人验证
PeerJ Comput Sci. 2023 Apr 21;9:e1276. doi: 10.7717/peerj-cs.1276. eCollection 2023.
9
Cluster-Based Pairwise Contrastive Loss for Noise-Robust Speech Recognition.用于抗噪声语音识别的基于聚类的成对对比损失
Sensors (Basel). 2024 Apr 17;24(8):2573. doi: 10.3390/s24082573.
10
Cost-sensitive learning for emotion robust speaker recognition.用于情感鲁棒性说话人识别的代价敏感学习
ScientificWorldJournal. 2014;2014:628516. doi: 10.1155/2014/628516. Epub 2014 Jun 4.

本文引用的文献

1
Speaker recognition based on deep learning: An overview.基于深度学习的说话人识别:综述。
Neural Netw. 2021 Aug;140:65-99. doi: 10.1016/j.neunet.2021.03.004. Epub 2021 Mar 17.
2
Res2Net: A New Multi-Scale Backbone Architecture.Res2Net:一种新的多尺度骨干网络架构。
IEEE Trans Pattern Anal Mach Intell. 2021 Feb;43(2):652-662. doi: 10.1109/TPAMI.2019.2938758. Epub 2021 Jan 8.
3
Survey of clustering algorithms.聚类算法综述
IEEE Trans Neural Netw. 2005 May;16(3):645-78. doi: 10.1109/TNN.2005.845141.