基于深度学习的说话人识别：综述。

Speaker recognition based on deep learning: An overview.

机构信息

Center of Intelligent Acoustics and Immersive Communications (CIAIC) and the School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an Shaanxi 710072, China.

出版信息

Neural Netw. 2021 Aug;140:65-99. doi: 10.1016/j.neunet.2021.03.004. Epub 2021 Mar 17.

DOI:10.1016/j.neunet.2021.03.004

PMID:33744714

Abstract

Speaker recognition is a task of identifying persons from their voices. Recently, deep learning has dramatically revolutionized speaker recognition. However, there is lack of comprehensive reviews on the exciting progress. In this paper, we review several major subtasks of speaker recognition, including speaker verification, identification, diarization, and robust speaker recognition, with a focus on deep-learning-based methods. Because the major advantage of deep learning over conventional methods is its representation ability, which is able to produce highly abstract embedding features from utterances, we first pay close attention to deep-learning-based speaker feature extraction, including the inputs, network structures, temporal pooling strategies, and objective functions respectively, which are the fundamental components of many speaker recognition subtasks. Then, we make an overview of speaker diarization, with an emphasis of recent supervised, end-to-end, and online diarization. Finally, we survey robust speaker recognition from the perspectives of domain adaptation and speech enhancement, which are two major approaches of dealing with domain mismatch and noise problems. Popular and recently released corpora are listed at the end of the paper.

摘要

说话人识别是从声音中识别说话人的任务。最近，深度学习极大地推动了说话人识别的发展。然而，缺乏对这一令人兴奋的进展的全面综述。在本文中，我们回顾了说话人识别的几个主要子任务，包括说话人验证、识别、声纹分割和鲁棒说话人识别，并重点介绍了基于深度学习的方法。由于深度学习相对于传统方法的主要优势在于其表示能力，它能够从话语中生成高度抽象的嵌入特征，因此我们首先关注基于深度学习的说话人特征提取，包括输入、网络结构、时间池化策略和目标函数，这些都是许多说话人识别子任务的基本组成部分。然后，我们对说话人声纹分割进行了概述，重点介绍了最近的监督、端到端和在线声纹分割。最后，我们从域自适应和语音增强的角度调查了鲁棒说话人识别，这是处理域不匹配和噪声问题的两种主要方法。最后列出了一些流行的和最近发布的语料库。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于深度学习的说话人识别：综述。

Speaker recognition based on deep learning: An overview.

机构信息

出版信息

相似文献

引用本文的文献

基于深度学习的说话人识别：综述。

Speaker recognition based on deep learning: An overview.

机构信息

出版信息

相似文献

引用本文的文献