Lee Sharon X, Leemaqz Kaleb L, McLachlan Geoffrey J
IEEE Trans Neural Netw Learn Syst. 2018 Nov;29(11):5581-5591. doi: 10.1109/TNNLS.2018.2805317. Epub 2018 Mar 9.
Finite mixtures of skew distributions provide a flexible tool for modeling heterogeneous data with asymmetric distributional features. However, parameter estimation via the Expectation-Maximization (EM) algorithm can become very time consuming due to the complicated expressions involved in the E-step that are numerically expensive to evaluate. While parallelizing the EM algorithm can offer considerable speedup in time performance, current implementations focus almost exclusively on distributed platforms. In this paper, we consider instead the most typical operating environment for users of mixture models-a standalone multicore machine and the R programming environment. We develop a block implementation of the EM algorithm that facilitates the calculations on the E- and M-steps to be spread across a number of threads. We focus on the fitting of finite mixtures of multivariate skew normal and skew distributions, and show that both the E- and M-steps in the EM algorithm can be modified to allow the data to be split into blocks. Our approach is easy to implement and provides immediate benefits to users of multicore machines. Experiments were conducted on two real data sets to demonstrate the effectiveness of the proposed approach.
偏态分布的有限混合模型为具有非对称分布特征的异质数据建模提供了一种灵活的工具。然而,由于期望最大化(EM)算法的E步中涉及复杂的表达式,其数值计算成本很高,因此通过该算法进行参数估计可能会非常耗时。虽然并行化EM算法可以显著提高时间性能,但目前的实现几乎完全集中在分布式平台上。在本文中,我们转而考虑混合模型用户最典型的操作环境——独立的多核机器和R编程环境。我们开发了一种EM算法的分块实现方式,便于将E步和M步的计算分布在多个线程上。我们专注于多元偏态正态分布和偏态分布的有限混合模型拟合,并表明EM算法中的E步和M步都可以进行修改,以便将数据拆分为块。我们的方法易于实现,能为多核机器的用户带来直接的好处。我们在两个真实数据集上进行了实验,以证明所提方法的有效性。