• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

H-VECTORS:使用分层注意力模型提高语句级说话人嵌入的鲁棒性。

H-VECTORS: Improving the robustness in utterance-level speaker embeddings using a hierarchical attention model.

机构信息

Speech and Hearing Research Group, Department of Computer Science, University of Sheffield, UK.

出版信息

Neural Netw. 2021 Oct;142:329-339. doi: 10.1016/j.neunet.2021.05.024. Epub 2021 May 25.

DOI:10.1016/j.neunet.2021.05.024
PMID:34098246
Abstract

In this paper, a hierarchical attention network is proposed to generate robust utterance-level embeddings (H-vectors) for speaker identification and verification. Since different parts of an utterance may have different contributions to speaker identities, the use of hierarchical structure aims to learn speaker related information locally and globally. In the proposed approach, frame-level encoder and attention are applied on segments of an input utterance and generate individual segment vectors. Then, segment level attention is applied on the segment vectors to construct an utterance representation. To evaluate the quality of the learned utterance-level speaker embeddings on speaker identification and verification, the proposed approach is tested on several benchmark datasets, such as the NIST SRE2008 Part1, the Switchboard Cellular (Part1), the CallHome American English Speech ,the Voxceleb1 and Voxceleb2 datasets. In comparison with some strong baselines, the obtained results show that the use of H-vectors can achieve better identification and verification performances in various acoustic conditions.

摘要

本文提出了一种层次注意力网络,用于生成用于说话人识别和验证的鲁棒的话语级嵌入(H-向量)。由于话语的不同部分可能对说话人身份有不同的贡献,因此分层结构的使用旨在局部和全局地学习与说话人相关的信息。在提出的方法中,帧级编码器和注意力应用于输入话语的片段上,并生成各个片段向量。然后,在片段向量上应用片段级注意力来构建话语表示。为了评估所学习的话语级说话人嵌入在说话人识别和验证方面的质量,该方法在几个基准数据集上进行了测试,例如 NIST SRE2008 第 1 部分、Switchboard Cellular(第 1 部分)、CallHome American English Speech、Voxceleb1 和 Voxceleb2 数据集。与一些强大的基线相比,所获得的结果表明,在各种声学条件下,使用 H-向量可以实现更好的识别和验证性能。

相似文献

1
H-VECTORS: Improving the robustness in utterance-level speaker embeddings using a hierarchical attention model.H-VECTORS:使用分层注意力模型提高语句级说话人嵌入的鲁棒性。
Neural Netw. 2021 Oct;142:329-339. doi: 10.1016/j.neunet.2021.05.024. Epub 2021 May 25.
2
Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition.基于硬负例采样的对比说话人表示学习在说话人识别中的应用。
Sensors (Basel). 2024 Sep 25;24(19):6213. doi: 10.3390/s24196213.
3
Combination of deep speaker embeddings for diarisation.用于语音分离的深度说话人嵌入组合
Neural Netw. 2021 Sep;141:372-384. doi: 10.1016/j.neunet.2021.04.020. Epub 2021 Apr 21.
4
Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization.生成对抗网络中用于说话人聚类的潜在空间聚类元学习
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1204-1219. doi: 10.1109/taslp.2021.3061885. Epub 2021 Feb 26.
5
Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。
Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.
6
Few-shot short utterance speaker verification using meta-learning.基于元学习的少样本短语音说话人验证
PeerJ Comput Sci. 2023 Apr 21;9:e1276. doi: 10.7717/peerj-cs.1276. eCollection 2023.
7
Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity.抑郁在言语中的表现与用于代表和识别说话者身份的特征重叠。
Sci Rep. 2023 Jul 10;13(1):11155. doi: 10.1038/s41598-023-35184-7.
8
Phonetic variability constrained bottleneck features for joint speaker recognition and physical task stress detection.用于联合说话人识别和身体任务压力检测的语音变异受限瓶颈特征
J Acoust Soc Am. 2020 Nov;148(5):2912. doi: 10.1121/10.0002455.
9
Lambda-vector modeling temporal and channel interactions for text-independent speaker verification.基于 Lambda-vector 的建模方法用于文本无关说话人验证中的时频和信道交互。
Sci Rep. 2022 Oct 28;12(1):18171. doi: 10.1038/s41598-022-22977-5.
10
Attention-Based Temporal-Frequency Aggregation for Speaker Verification.基于注意力的时频聚合在说话人验证中的应用。
Sensors (Basel). 2022 Mar 10;22(6):2147. doi: 10.3390/s22062147.

引用本文的文献

1
Lambda-vector modeling temporal and channel interactions for text-independent speaker verification.基于 Lambda-vector 的建模方法用于文本无关说话人验证中的时频和信道交互。
Sci Rep. 2022 Oct 28;12(1):18171. doi: 10.1038/s41598-022-22977-5.