NTT Communication Science Laboratories, 2-4 Hikaridai Seika-cho, 619-0237 Kyoto, Japan.
J Acoust Soc Am. 2013 May;133(5):EL339-45. doi: 10.1121/1.4795851.
This paper introduces an approach for online speech source clustering and separation, which is based on the utilization of the multichannel location information in a recursive expectation maximization (EM) algorithm. Specifically, the normalized multichannel speech-recording vector is employed as a feature vector and is modeled using Watson mixture model. The model parameters are determined by maximizing the data likelihood at every time-frequency slot in an online processing manner. Consequently, the proposed approach can continuously adjust the speech clusters. Promising results showing the advantage of the proposed approach over the batch EM algorithm in the case of two speakers with speaker movement are obtained.
本文提出了一种基于递归期望最大化(EM)算法中多通道位置信息利用的在线语音源聚类和分离方法。具体来说,使用归一化多通道语音记录向量作为特征向量,并使用 Watson 混合模型对其进行建模。模型参数通过以在线处理方式在每个时频槽中最大化数据似然度来确定。因此,所提出的方法可以不断调整语音聚类。在两个说话人存在说话人移动的情况下,与批量 EM 算法相比,所提出的方法具有优势,结果令人鼓舞。