Lei Lei, She Kun
School of Information and Software Engineering, University of Electrical and Science and Technology of China, Chengdu 610054, China.
Entropy (Basel). 2018 Aug 13;20(8):600. doi: 10.3390/e20080600.
Recently, the accuracy of voice authentication system has increased significantly due to the successful application of the identity vector (i-vector) model. This paper proposes a new method for i-vector extraction. In the method, a perceptual wavelet packet transform (PWPT) is designed to convert speech utterances into wavelet entropy feature vectors, and a Convolutional Neural Network (CNN) is designed to estimate the frame posteriors of the wavelet entropy feature vectors. In the end, i-vector is extracted based on those frame posteriors. TIMIT and VoxCeleb speech corpus are used for experiments and the experimental results show that the proposed method can extract appropriate i-vector which reduces the equal error rate () and improve the accuracy of voice authentication system in clean and noisy environment.
近年来,由于身份向量(i-vector)模型的成功应用,语音认证系统的准确性显著提高。本文提出了一种新的i-vector提取方法。该方法中,设计了一种感知小波包变换(PWPT)将语音话语转换为小波熵特征向量,并设计了一个卷积神经网络(CNN)来估计小波熵特征向量的帧后验概率。最后,基于这些帧后验概率提取i-vector。使用TIMIT和VoxCeleb语音语料库进行实验,实验结果表明,该方法能够提取合适的i-vector,降低等错误率(EER),并提高语音认证系统在干净和嘈杂环境下的准确性。