通过将高斯混合模型（GTM）作为t分布的混合来进行缺失数据插补。

Missing data imputation through GTM as a mixture of t-distributions.

作者信息

Vellido Alfredo

机构信息

Department of Computing Languages and Systems (LSI), Polytechnic University of Catalonia (UPC), C. Jordi Girona, 1-3. 08034, Barcelona, Spain.

出版信息

Neural Netw. 2006 Dec;19(10):1624-35. doi: 10.1016/j.neunet.2005.11.003. Epub 2006 Mar 31.

DOI:10.1016/j.neunet.2005.11.003

PMID:16580176

Abstract

The Generative Topographic Mapping (GTM) was originally conceived as a probabilistic alternative to the well-known, neural network-inspired, Self-Organizing Maps. The GTM can also be interpreted as a constrained mixture of distribution models. In recent years, much attention has been directed towards Student t-distributions as an alternative to Gaussians in mixture models due to their robustness towards outliers. In this paper, the GTM is redefined as a constrained mixture of t-distributions: the t-GTM, and the Expectation-Maximization algorithm that is used to fit the model to the data is modified to carry out missing data imputation. Several experiments show that the t-GTM successfully detects outliers, while minimizing their impact on the estimation of the model parameters. It is also shown that the t-GTM provides an overall more accurate imputation of missing values than the standard Gaussian GTM.

摘要

生成地形映射（GTM）最初被构想为一种概率性方法，可替代广为人知的、受神经网络启发的自组织映射。GTM也可被解释为分布模型的一种受限混合。近年来，由于学生t分布对异常值具有鲁棒性，在混合模型中作为高斯分布的替代受到了广泛关注。在本文中，GTM被重新定义为t分布的受限混合：t-GTM，并且用于将模型拟合到数据的期望最大化算法被修改以进行缺失数据插补。几个实验表明，t-GTM成功地检测到异常值，同时将它们对模型参数估计的影响降至最低。还表明，与标准高斯GTM相比，t-GTM对缺失值的插补总体上更准确。