Nippon Telegraph and Telephone Corporation, Kanagawa, 243-0198, Japan
University of Tsukuba, Ibaraki, 305-8577, Japan
Neural Comput. 2019 Sep;31(9):1891-1914. doi: 10.1162/neco_a_01217. Epub 2019 Jul 23.
This letter proposes a multichannel source separation technique, the multichannel variational autoencoder (MVAE) method, which uses a conditional VAE (CVAE) to model and estimate the power spectrograms of the sources in a mixture. By training the CVAE using the spectrograms of training examples with source-class labels, we can use the trained decoder distribution as a universal generative model capable of generating spectrograms conditioned on a specified class index. By treating the latent space variables and the class index as the unknown parameters of this generative model, we can develop a convergence-guaranteed algorithm for supervised determined source separation that consists of iteratively estimating the power spectrograms of the underlying sources, as well as the separation matrices. In experimental evaluations, our MVAE produced better separation performance than a baseline method.
这封信提出了一种多通道源分离技术,即多通道变分自动编码器(MVAE)方法,该方法使用条件变分自动编码器(CVAE)来对混合物中源的功率谱图进行建模和估计。通过使用带有源类标签的训练示例的谱图来训练 CVAE,我们可以使用训练后的解码器分布作为通用生成模型,该模型能够根据指定的类索引生成谱图。通过将潜在空间变量和类索引视为该生成模型的未知参数,我们可以开发一种具有保证收敛性的监督确定源分离算法,该算法包括迭代估计潜在源的功率谱图以及分离矩阵。在实验评估中,我们的 MVAE 产生了比基线方法更好的分离性能。