IEEE Trans Med Imaging. 2018 Feb;37(2):396-407. doi: 10.1109/TMI.2017.2749140. Epub 2017 Sep 4.
Principal component analysis (PCA) is an exploratory tool widely used in data analysis to uncover the dominant patterns of variability within a population. Despite its ability to represent a data set in a low-dimensional space, PCA's interpretability remains limited. Indeed, the components produced by PCA are often noisy or exhibit no visually meaningful patterns. Furthermore, the fact that the components are usually non-sparse may also impede interpretation, unless arbitrary thresholding is applied. However, in neuroimaging, it is essential to uncover clinically interpretable phenotypic markers that would account for the main variability in the brain images of a population. Recently, some alternatives to the standard PCA approach, such as sparse PCA (SPCA), have been proposed, their aim being to limit the density of the components. Nonetheless, sparsity alone does not entirely solve the interpretability problem in neuroimaging, since it may yield scattered and unstable components. We hypothesized that the incorporation of prior information regarding the structure of the data may lead to improved relevance and interpretability of brain patterns. We therefore present a simple extension of the popular PCA framework that adds structured sparsity penalties on the loading vectors in order to identify the few stable regions in the brain images that capture most of the variability. Such structured sparsity can be obtained by combining, e.g., and total variation (TV) penalties, where the TV regularization encodes information on the underlying structure of the data. This paper presents the structured SPCA (denoted SPCA-TV) optimization framework and its resolution. We demonstrate SPCA-TV's effectiveness and versatility on three different data sets. It can be applied to any kind of structured data, such as, e.g., -dimensional array images or meshes of cortical surfaces. The gains of SPCA-TV over unstructured approaches (such as SPCA and ElasticNet PCA) or structured approach (such as GraphNet PCA) are significant, since SPCA-TV reveals the variability within a data set in the form of intelligible brain patterns that are easier to interpret and more stable across different samples.
主成分分析(PCA)是一种广泛应用于数据分析的探索性工具,用于揭示总体中变量的主要模式。尽管它能够在低维空间中表示数据集,但 PCA 的可解释性仍然有限。事实上,PCA 生成的分量通常是嘈杂的或没有明显的视觉意义模式。此外,除非应用任意阈值,否则分量通常是非稀疏的,这也可能阻碍解释。然而,在神经影像学中,揭示可用于解释的临床表型标记是至关重要的,这些标记可以解释总体大脑图像中的主要变异性。最近,已经提出了一些替代标准 PCA 方法的方法,例如稀疏 PCA(SPCA),其目的是限制分量的密度。然而,稀疏性本身并不能完全解决神经影像学中的可解释性问题,因为它可能产生分散和不稳定的分量。我们假设,结合关于数据结构的先验信息可能会导致大脑模式的相关性和可解释性得到改善。因此,我们提出了一种流行的 PCA 框架的简单扩展,该框架在加载向量上添加了结构化稀疏性惩罚,以便识别大脑图像中捕获大部分变异性的少数稳定区域。这种结构化稀疏性可以通过组合,例如,总变差(TV)惩罚来获得,其中 TV 正则化编码了数据的基础结构的信息。本文介绍了结构化 SPCA(表示为 SPCA-TV)的优化框架及其分辨率。我们在三个不同的数据集上演示了 SPCA-TV 的有效性和多功能性。它可以应用于任何类型的结构化数据,例如,二维数组图像或皮质表面的网格。SPCA-TV 相对于非结构化方法(例如 SPCA 和 ElasticNet PCA)或结构化方法(例如 GraphNet PCA)的优势是显著的,因为 SPCA-TV 以更易于解释和在不同样本中更稳定的可理解的大脑模式的形式揭示了数据集内的变异性。