Pedersen Michael Syskind, Wang DeLiang, Larsen Jan, Kjems Ulrik
Oticon A/S, Smørum DK-2765, Denmark.
IEEE Trans Neural Netw. 2008 Mar;19(3):475-92. doi: 10.1109/TNN.2007.911740.
Separation of speech mixtures, often referred to as the cocktail party problem, has been studied for decades. In many source separation tasks, the separation method is limited by the assumption of at least as many sensors as sources. Further, many methods require that the number of signals within the recorded mixtures be known in advance. In many real-world applications, these limitations are too restrictive. We propose a novel method for underdetermined blind source separation using an instantaneous mixing model which assumes closely spaced microphones. Two source separation techniques have been combined, independent component analysis (ICA) and binary time - frequency (T-F) masking. By estimating binary masks from the outputs of an ICA algorithm, it is possible in an iterative way to extract basis speech signals from a convolutive mixture. The basis signals are afterwards improved by grouping similar signals. Using two microphones, we can separate, in principle, an arbitrary number of mixed speech signals. We show separation results for mixtures with as many as seven speech signals under instantaneous conditions. We also show that the proposed method is applicable to segregate speech signals under reverberant conditions, and we compare our proposed method to another state-of-the-art algorithm. The number of source signals is not assumed to be known in advance and it is possible to maintain the extracted signals as stereo signals.
语音混合分离,通常被称为鸡尾酒会问题,已经被研究了几十年。在许多源分离任务中,分离方法受到传感器数量至少与源数量一样多这一假设的限制。此外,许多方法要求预先知道录制混合信号中的信号数量。在许多实际应用中,这些限制过于严格。我们提出了一种使用瞬时混合模型的欠定盲源分离新方法,该模型假设麦克风间距很近。我们将两种源分离技术相结合,即独立成分分析(ICA)和二进制时频(T-F)掩蔽。通过从ICA算法的输出中估计二进制掩码,可以以迭代方式从卷积混合信号中提取基本语音信号。之后通过对相似信号进行分组来改进基本信号。原则上,使用两个麦克风我们可以分离任意数量的混合语音信号。我们展示了在瞬时条件下多达七个语音信号的混合信号的分离结果。我们还表明所提出的方法适用于在混响条件下分离语音信号,并且我们将所提出的方法与另一种最新算法进行了比较。源信号的数量不假定预先已知,并且可以将提取的信号保持为立体声信号。