• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 Lambda-vector 的建模方法用于文本无关说话人验证中的时频和信道交互。

Lambda-vector modeling temporal and channel interactions for text-independent speaker verification.

机构信息

College of Intelligent Equipment, Shandong University of Science and Technology, Taian, 271019, Shandong, China.

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, 266590, Shandong, China.

出版信息

Sci Rep. 2022 Oct 28;12(1):18171. doi: 10.1038/s41598-022-22977-5.

DOI:10.1038/s41598-022-22977-5
PMID:36307520
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9616814/
Abstract

Most of the current excellent models in speaker verification are ResNet-based deep models and attention-based models. These models have a general weakness, which is the large number of parameters and high hardware requirements. On the other hand, many deep structures only generate embedding features from the features extracted by the last frame-level layer, which causes shallow features and channel-related features to be ignored. To solve these problems, this paper proposed a shallow speaker verification model based on Lambda-vector, its main structure is composed of three Lambda-SE modules. The module extracts long-distance dependencies between frame-level features and channel-related interaction information to enhance representation of features. Meanwhile, so that adequately mine the information in deep and shallow features, the model introduces multi-layer feature aggregation to fuse the features of different frame-level layers together. It can increase the detailed information in the deep features and improve the model's ability to represent complex information. The experimental results on the public datasets Voxceleb1 and Voxceleb2 show that the model has more stable training speed, fewer model parameters, and better identification performances than baseline models.

摘要

目前大多数优秀的说话人确认模型都是基于 ResNet 的深度模型和基于注意力的模型。这些模型有一个普遍的弱点,即参数数量多,硬件要求高。另一方面,许多深度结构仅从最后一帧级别的特征中提取嵌入特征,这导致忽略了浅层特征和与通道相关的特征。为了解决这些问题,本文提出了一种基于 Lambda-vector 的浅层说话人确认模型,其主要结构由三个 Lambda-SE 模块组成。该模块提取帧级特征之间的长距离依赖关系和与通道相关的交互信息,以增强特征的表示能力。同时,为了充分挖掘深层和浅层特征中的信息,模型引入了多层特征聚合,将不同帧级层的特征融合在一起。这可以增加深层特征中的详细信息,并提高模型对复杂信息的表示能力。在公开数据集 Voxceleb1 和 Voxceleb2 上的实验结果表明,与基线模型相比,该模型具有更稳定的训练速度、更少的模型参数和更好的识别性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/e884a483190f/41598_2022_22977_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/74df4fd6de9d/41598_2022_22977_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/ce9cd2029801/41598_2022_22977_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/ce55037d8dea/41598_2022_22977_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/ce1048e60f8b/41598_2022_22977_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/be56adbe19c7/41598_2022_22977_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/877cfd6f9f62/41598_2022_22977_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/e884a483190f/41598_2022_22977_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/74df4fd6de9d/41598_2022_22977_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/ce9cd2029801/41598_2022_22977_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/ce55037d8dea/41598_2022_22977_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/ce1048e60f8b/41598_2022_22977_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/be56adbe19c7/41598_2022_22977_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/877cfd6f9f62/41598_2022_22977_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/040d/9616814/e884a483190f/41598_2022_22977_Fig7_HTML.jpg

相似文献

1
Lambda-vector modeling temporal and channel interactions for text-independent speaker verification.基于 Lambda-vector 的建模方法用于文本无关说话人验证中的时频和信道交互。
Sci Rep. 2022 Oct 28;12(1):18171. doi: 10.1038/s41598-022-22977-5.
2
H-VECTORS: Improving the robustness in utterance-level speaker embeddings using a hierarchical attention model.H-VECTORS:使用分层注意力模型提高语句级说话人嵌入的鲁棒性。
Neural Netw. 2021 Oct;142:329-339. doi: 10.1016/j.neunet.2021.05.024. Epub 2021 May 25.
3
Attention-Based Temporal-Frequency Aggregation for Speaker Verification.基于注意力的时频聚合在说话人验证中的应用。
Sensors (Basel). 2022 Mar 10;22(6):2147. doi: 10.3390/s22062147.
4
Bidirectional Attention for Text-Dependent Speaker Verification.基于双向注意力的文本相关说话人验证。
Sensors (Basel). 2020 Nov 27;20(23):6784. doi: 10.3390/s20236784.
5
Multi-level Feature Interaction and Efficient Non-Local Information Enhanced Channel Attention for image dehazing.多层次特征交互和高效非局部信息增强的信道注意力用于图像去雾。
Neural Netw. 2023 Jun;163:10-27. doi: 10.1016/j.neunet.2023.03.017. Epub 2023 Mar 17.
6
Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。
Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.
7
Few-shot short utterance speaker verification using meta-learning.基于元学习的少样本短语音说话人验证
PeerJ Comput Sci. 2023 Apr 21;9:e1276. doi: 10.7717/peerj-cs.1276. eCollection 2023.
8
D-MONA: A dilated mixed-order non-local attention network for speaker and language recognition.D-MONA:一种用于说话人识别和语言识别的扩展混合阶非局部注意力网络。
Neural Netw. 2021 Jul;139:201-211. doi: 10.1016/j.neunet.2021.03.014. Epub 2021 Mar 18.
9
Speaker verification based on the fusion of speech acoustics and inverted articulatory signals.基于语音声学与反向发音信号融合的说话人验证
Comput Speech Lang. 2016 Mar;36:196-211. doi: 10.1016/j.csl.2015.05.003. Epub 2015 May 22.
10
Audio-Visual Fusion Based on Interactive Attention for Person Verification.基于交互注意力的视听融合的人像验证。
Sensors (Basel). 2023 Dec 15;23(24):9845. doi: 10.3390/s23249845.

本文引用的文献

1
H-VECTORS: Improving the robustness in utterance-level speaker embeddings using a hierarchical attention model.H-VECTORS:使用分层注意力模型提高语句级说话人嵌入的鲁棒性。
Neural Netw. 2021 Oct;142:329-339. doi: 10.1016/j.neunet.2021.05.024. Epub 2021 May 25.
2
Speaker recognition based on deep learning: An overview.基于深度学习的说话人识别:综述。
Neural Netw. 2021 Aug;140:65-99. doi: 10.1016/j.neunet.2021.03.004. Epub 2021 Mar 17.