Ohi Abu Quwsar, Gavrilova Marina L
Department of Computer Science, University of Calgary, Calgary, AB T2N1N4, Canada.
Sensors (Basel). 2024 Mar 21;24(6):1996. doi: 10.3390/s24061996.
Speaker recognition is a challenging problem in behavioral biometrics that has been rigorously investigated over the last decade. Although numerous supervised closed-set systems inherit the power of deep neural networks, limited studies have been made on open-set speaker recognition. This paper proposes a self-supervised open-set speaker recognition that leverages the geometric properties of speaker distribution for accurate and robust speaker verification. The proposed framework consists of a deep neural network incorporating a wider viewpoint of temporal speech features and Laguerre-Voronoi diagram-based speech feature extraction. The deep neural network is trained with a specialized clustering criterion that only requires positive pairs during training. The experiments validated that the proposed system outperformed current state-of-the-art methods in open-set speaker recognition and cluster representation.
说话人识别是行为生物识别领域中一个具有挑战性的问题,在过去十年中受到了严格的研究。尽管许多有监督的闭集系统继承了深度神经网络的强大功能,但对开集说话人识别的研究却很有限。本文提出了一种自监督的开集说话人识别方法,该方法利用说话人分布的几何特性进行准确且稳健的说话人验证。所提出的框架由一个深度神经网络组成,该网络结合了更广泛的时间语音特征观点和基于拉盖尔 - 沃罗诺伊图的语音特征提取。深度神经网络通过一种专门的聚类准则进行训练,该准则在训练期间只需要正样本对。实验验证了所提出的系统在开集说话人识别和聚类表示方面优于当前最先进的方法。