Gortler Jochen, Spinner Thilo, Streeb Dirk, Weiskopf Daniel, Deussen Oliver
IEEE Trans Vis Comput Graph. 2020 Jan;26(1):822-831. doi: 10.1109/TVCG.2019.2934812. Epub 2019 Oct 10.
We present a technique to perform dimensionality reduction on data that is subject to uncertainty. Our method is a generalization of traditional principal component analysis (PCA) to multivariate probability distributions. In comparison to non-linear methods, linear dimensionality reduction techniques have the advantage that the characteristics of such probability distributions remain intact after projection. We derive a representation of the PCA sample covariance matrix that respects potential uncertainty in each of the inputs, building the mathematical foundation of our new method: uncertainty-aware PCA. In addition to the accuracy and performance gained by our approach over sampling-based strategies, our formulation allows us to perform sensitivity analysis with regard to the uncertainty in the data. For this, we propose factor traces as a novel visualization that enables to better understand the influence of uncertainty on the chosen principal components. We provide multiple examples of our technique using real-world datasets. As a special case, we show how to propagate multivariate normal distributions through PCA in closed form. Furthermore, we discuss extensions and limitations of our approach.
我们提出了一种对存在不确定性的数据进行降维的技术。我们的方法是将传统主成分分析(PCA)推广到多元概率分布。与非线性方法相比,线性降维技术的优势在于投影后此类概率分布的特征保持不变。我们推导了一个尊重每个输入中潜在不确定性的PCA样本协方差矩阵的表示形式,构建了我们新方法——不确定性感知PCA的数学基础。除了我们的方法比基于采样的策略所获得的准确性和性能之外,我们的公式使我们能够对数据中的不确定性进行敏感性分析。为此,我们提出因子轨迹作为一种新颖的可视化方法,能够更好地理解不确定性对所选主成分的影响。我们使用真实世界数据集提供了我们技术的多个示例。作为一个特殊情况,我们展示了如何以封闭形式通过PCA传播多元正态分布。此外,我们讨论了我们方法的扩展和局限性。