• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于拉盖尔-沃罗诺伊描述符的自监督开集说话人识别

Self-Supervised Open-Set Speaker Recognition with Laguerre-Voronoi Descriptors.

作者信息

Ohi Abu Quwsar, Gavrilova Marina L

机构信息

Department of Computer Science, University of Calgary, Calgary, AB T2N1N4, Canada.

出版信息

Sensors (Basel). 2024 Mar 21;24(6):1996. doi: 10.3390/s24061996.

DOI:10.3390/s24061996
PMID:38544258
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10975617/
Abstract

Speaker recognition is a challenging problem in behavioral biometrics that has been rigorously investigated over the last decade. Although numerous supervised closed-set systems inherit the power of deep neural networks, limited studies have been made on open-set speaker recognition. This paper proposes a self-supervised open-set speaker recognition that leverages the geometric properties of speaker distribution for accurate and robust speaker verification. The proposed framework consists of a deep neural network incorporating a wider viewpoint of temporal speech features and Laguerre-Voronoi diagram-based speech feature extraction. The deep neural network is trained with a specialized clustering criterion that only requires positive pairs during training. The experiments validated that the proposed system outperformed current state-of-the-art methods in open-set speaker recognition and cluster representation.

摘要

说话人识别是行为生物识别领域中一个具有挑战性的问题,在过去十年中受到了严格的研究。尽管许多有监督的闭集系统继承了深度神经网络的强大功能,但对开集说话人识别的研究却很有限。本文提出了一种自监督的开集说话人识别方法,该方法利用说话人分布的几何特性进行准确且稳健的说话人验证。所提出的框架由一个深度神经网络组成,该网络结合了更广泛的时间语音特征观点和基于拉盖尔 - 沃罗诺伊图的语音特征提取。深度神经网络通过一种专门的聚类准则进行训练,该准则在训练期间只需要正样本对。实验验证了所提出的系统在开集说话人识别和聚类表示方面优于当前最先进的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/9f06e45c8838/sensors-24-01996-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/0c38d8d77142/sensors-24-01996-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/1b21ab533c74/sensors-24-01996-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/27ea184f23ba/sensors-24-01996-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/e4077d25f294/sensors-24-01996-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/3da874468a7b/sensors-24-01996-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/9380bbce9030/sensors-24-01996-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/12981525ea95/sensors-24-01996-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/9f06e45c8838/sensors-24-01996-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/0c38d8d77142/sensors-24-01996-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/1b21ab533c74/sensors-24-01996-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/27ea184f23ba/sensors-24-01996-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/e4077d25f294/sensors-24-01996-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/3da874468a7b/sensors-24-01996-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/9380bbce9030/sensors-24-01996-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/12981525ea95/sensors-24-01996-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e354/10975617/9f06e45c8838/sensors-24-01996-g008.jpg

相似文献

1
Self-Supervised Open-Set Speaker Recognition with Laguerre-Voronoi Descriptors.基于拉盖尔-沃罗诺伊描述符的自监督开集说话人识别
Sensors (Basel). 2024 Mar 21;24(6):1996. doi: 10.3390/s24061996.
2
Speaker recognition based on deep learning: An overview.基于深度学习的说话人识别:综述。
Neural Netw. 2021 Aug;140:65-99. doi: 10.1016/j.neunet.2021.03.004. Epub 2021 Mar 17.
3
Partially supervised speaker clustering.部分监督的说话人聚类。
IEEE Trans Pattern Anal Mach Intell. 2012 May;34(5):959-71. doi: 10.1109/TPAMI.2011.174.
4
Learning speaker-specific characteristics with a deep neural architecture.利用深度神经架构学习特定说话者的特征。
IEEE Trans Neural Netw. 2011 Nov;22(11):1744-56. doi: 10.1109/TNN.2011.2167240. Epub 2011 Sep 26.
5
Combination of deep speaker embeddings for diarisation.用于语音分离的深度说话人嵌入组合
Neural Netw. 2021 Sep;141:372-384. doi: 10.1016/j.neunet.2021.04.020. Epub 2021 Apr 21.
6
Phonetic variability constrained bottleneck features for joint speaker recognition and physical task stress detection.用于联合说话人识别和身体任务压力检测的语音变异受限瓶颈特征
J Acoust Soc Am. 2020 Nov;148(5):2912. doi: 10.1121/10.0002455.
7
Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition.基于硬负例采样的对比说话人表示学习在说话人识别中的应用。
Sensors (Basel). 2024 Sep 25;24(19):6213. doi: 10.3390/s24196213.
8
Semi Supervised Learning with Deep Embedded Clustering for Image Classification and Segmentation.用于图像分类和分割的深度嵌入聚类半监督学习
IEEE Access. 2019;7:11093-11104. doi: 10.1109/ACCESS.2019.2891970. Epub 2019 Jan 9.
9
Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition.融合卷积-BERT:语音情感识别的并行卷积和 BERT 融合。
Sensors (Basel). 2020 Nov 23;20(22):6688. doi: 10.3390/s20226688.
10
Attention-Based Temporal-Frequency Aggregation for Speaker Verification.基于注意力的时频聚合在说话人验证中的应用。
Sensors (Basel). 2022 Mar 10;22(6):2147. doi: 10.3390/s22062147.

本文引用的文献

1
Enhancing Human Activity Recognition in Smart Homes with Self-Supervised Learning and Self-Attention.基于自监督学习和自注意力机制的智能家居中人类活动识别增强。
Sensors (Basel). 2024 Jan 29;24(3):884. doi: 10.3390/s24030884.
2
Recent Advances in Open Set Recognition: A Survey.开放集识别的最新进展:一项综述。
IEEE Trans Pattern Anal Mach Intell. 2021 Oct;43(10):3614-3631. doi: 10.1109/TPAMI.2020.2981604. Epub 2021 Sep 2.