Ahmed Imtiaz, Galoppo Travis, Hu Xia, Ding Yu
IEEE Trans Pattern Anal Mach Intell. 2022 Aug;44(8):4110-4124. doi: 10.1109/TPAMI.2021.3066111. Epub 2022 Jul 1.
Dimensionality reduction is a crucial first step for many unsupervised learning tasks including anomaly detection and clustering. Autoencoder is a popular mechanism to accomplish dimensionality reduction. In order to make dimensionality reduction effective for high-dimensional data embedding nonlinear low-dimensional manifold, it is understood that some sort of geodesic distance metric should be used to discriminate the data samples. Inspired by the success of geodesic distance approximators such as ISOMAP, we propose to use a minimum spanning tree (MST), a graph-based algorithm, to approximate the local neighborhood structure and generate structure-preserving distances among data points. We use this MST-based distance metric to replace the euclidean distance metric in the embedding function of autoencoders and develop a new graph regularized autoencoder, which outperforms a wide range of alternative methods over 20 benchmark anomaly detection datasets. We further incorporate the MST regularizer into two generative adversarial networks and find that using the MST regularizer improves the performance of anomaly detection substantially for both generative adversarial networks. We also test our MST regularized autoencoder on two datasets in a clustering application and witness its superior performance as well.
降维是许多无监督学习任务(包括异常检测和聚类)的关键第一步。自动编码器是实现降维的一种流行机制。为了使降维对于嵌入非线性低维流形的高维数据有效,人们知道应该使用某种测地距离度量来区分数据样本。受诸如等距映射(ISOMAP)等测地距离近似器成功的启发,我们建议使用最小生成树(MST),一种基于图的算法,来近似局部邻域结构并生成数据点之间的保结构距离。我们使用这种基于MST的距离度量来替换自动编码器嵌入函数中的欧几里得距离度量,并开发了一种新的图正则化自动编码器,在20个基准异常检测数据集上,它优于多种替代方法。我们进一步将MST正则化器纳入两个生成对抗网络,发现使用MST正则化器对于这两个生成对抗网络的异常检测性能都有显著提升。我们还在聚类应用中的两个数据集上测试了我们的MST正则化自动编码器,也见证了它的卓越性能。