Ng Shu-Kay, McLachlan Geoffrey John
Department of Mathematics, University of Queensland, Brisbane QLD 4072, Australia.
IEEE Trans Neural Netw. 2004 May;15(3):738-49. doi: 10.1109/TNN.2004.826217.
The expectation-maximization (EM) algorithm has been of considerable interest in recent years as the basis for various algorithms in application areas of neural networks such as pattern recognition. However, there exists some misconceptions concerning its application to neural networks. In this paper, we clarify these misconceptions and consider how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification. We identify some situations where the application of the EM algorithm to train MLP networks may be of limited value and discuss some ways of handling the difficulties. For ME networks, it is reported in the literature that networks trained by the EM algorithm using iteratively reweighted least squares (IRLS) algorithm in the inner loop of the M-step, often performed poorly in multiclass classification. However, we found that the convergence of the IRLS algorithm is stable and that the log likelihood is monotonic increasing when a learning rate smaller than one is adopted. Also, we propose the use of an expectation-conditional maximization (ECM) algorithm to train ME networks. Its performance is demonstrated to be superior to the IRLS algorithm on some simulated and real data sets.
期望最大化(EM)算法近年来备受关注,它是神经网络应用领域(如模式识别)中各种算法的基础。然而,对于其在神经网络中的应用存在一些误解。在本文中,我们澄清了这些误解,并考虑了如何采用EM算法在多类分类应用中训练多层感知器(MLP)和专家混合(ME)网络。我们确定了一些情况下,将EM算法应用于训练MLP网络可能价值有限,并讨论了一些处理困难的方法。对于ME网络,文献报道称,在M步的内循环中使用迭代加权最小二乘法(IRLS)算法通过EM算法训练的网络,在多类分类中通常表现不佳。然而,我们发现IRLS算法的收敛是稳定的,并且当采用小于1的学习率时,对数似然是单调增加的。此外,我们提出使用期望条件最大化(ECM)算法来训练ME网络。在一些模拟和真实数据集上,其性能被证明优于IRLS算法。