IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4637-4649. doi: 10.1109/TPAMI.2022.3195462. Epub 2023 Mar 7.
Principal components analysis has been used to reduce the dimensionality of datasets for a long time. In this paper, we will demonstrate that in mode detection the components of smallest variance, the pettiest components, are more important. We prove that for a multivariate normal or Laplace distribution, we obtain boxes of optimal volume by implementing "pettiest component analysis," in the sense that their volume is minimal over all possible boxes with the same number of dimensions and fixed probability. This reduction in volume produces an information gain that is measured using active information. We illustrate our results with a simulation and a search for modal patterns of digitized images of hand-written numbers using the famous MNIST database; in both cases pettiest components work better than their competitors. In fact, we show that modes obtained with pettiest components generate better written digits for MNIST than principal components.
主成分分析(Principal components analysis)已经被广泛应用于数据集的降维处理。在本文中,我们将证明在模式检测中,方差最小的成分(即最小的成分)更为重要。我们证明,对于多元正态分布或拉普拉斯分布,通过实施“最小成分分析”(pettiest component analysis),我们可以获得具有相同维数和固定概率的所有可能框中具有最小体积的最优框。这种体积的减小会产生使用主动信息(active information)测量的信息增益。我们通过模拟和使用著名的 MNIST 数据库对手写数字的数字化图像的模式模式进行搜索来说明我们的结果;在这两种情况下,最小成分都比其竞争对手表现更好。实际上,我们表明,使用最小成分获得的模式可以为 MNIST 生成比主成分更好的手写数字。