Rachakonda Srinivas, Silva Rogers F, Liu Jingyu, Calhoun Vince D
The Mind Research Network and Lovelace Biomedical and Environmental Research Institute Albuquerque, NM, USA.
The Mind Research Network and Lovelace Biomedical and Environmental Research InstituteAlbuquerque, NM, USA; Department of Electrical and Computer Engineering, The University of New MexicoAlbuquerque, NM, USA.
Front Neurosci. 2016 Feb 2;10:17. doi: 10.3389/fnins.2016.00017. eCollection 2016.
Principal component analysis (PCA) is widely used for data reduction in group independent component analysis (ICA) of fMRI data. Commonly, group-level PCA of temporally concatenated datasets is computed prior to ICA of the group principal components. This work focuses on reducing very high dimensional temporally concatenated datasets into its group PCA space. Existing randomized PCA methods can determine the PCA subspace with minimal memory requirements and, thus, are ideal for solving large PCA problems. Since the number of dataloads is not typically optimized, we extend one of these methods to compute PCA of very large datasets with a minimal number of dataloads. This method is coined multi power iteration (MPOWIT). The key idea behind MPOWIT is to estimate a subspace larger than the desired one, while checking for convergence of only the smaller subset of interest. The number of iterations is reduced considerably (as well as the number of dataloads), accelerating convergence without loss of accuracy. More importantly, in the proposed implementation of MPOWIT, the memory required for successful recovery of the group principal components becomes independent of the number of subjects analyzed. Highly efficient subsampled eigenvalue decomposition techniques are also introduced, furnishing excellent PCA subspace approximations that can be used for intelligent initialization of randomized methods such as MPOWIT. Together, these developments enable efficient estimation of accurate principal components, as we illustrate by solving a 1600-subject group-level PCA of fMRI with standard acquisition parameters, on a regular desktop computer with only 4 GB RAM, in just a few hours. MPOWIT is also highly scalable and could realistically solve group-level PCA of fMRI on thousands of subjects, or more, using standard hardware, limited only by time, not memory. Also, the MPOWIT algorithm is highly parallelizable, which would enable fast, distributed implementations ideal for big data analysis. Implications to other methods such as expectation maximization PCA (EM PCA) are also presented. Based on our results, general recommendations for efficient application of PCA methods are given according to problem size and available computational resources. MPOWIT and all other methods discussed here are implemented and readily available in the open source GIFT software.
主成分分析(PCA)在功能磁共振成像(fMRI)数据的组独立成分分析(ICA)中被广泛用于数据降维。通常,在对组主成分进行ICA之前,先对时间上串联的数据集进行组级PCA计算。这项工作专注于将非常高维的时间上串联的数据集降维到其组PCA空间。现有的随机PCA方法可以用最小的内存需求确定PCA子空间,因此,非常适合解决大型PCA问题。由于数据加载次数通常未得到优化,我们扩展了其中一种方法,以用最少的数据加载次数计算非常大的数据集的PCA。这种方法被称为多幂迭代(MPOWIT)。MPOWIT背后的关键思想是估计一个比所需子空间更大的子空间,同时只检查感兴趣的较小子集的收敛情况。迭代次数大幅减少(以及数据加载次数),在不损失准确性的情况下加速收敛。更重要的是,在MPOWIT的建议实现中,成功恢复组主成分所需的内存变得与所分析的受试者数量无关。还引入了高效的子采样特征值分解技术,提供了出色的PCA子空间近似值,可用于MPOWIT等随机方法的智能初始化。这些进展共同实现了准确主成分的高效估计,正如我们通过在一台只有4GB内存的普通台式计算机上,在短短几个小时内解决具有标准采集参数的1600名受试者的fMRI组级PCA所说明的那样。MPOWIT也具有高度可扩展性,实际上可以使用标准硬件解决数千名或更多受试者的fMRI组级PCA,仅受时间限制,不受内存限制。此外,MPOWIT算法具有高度可并行性,这将实现适用于大数据分析的快速分布式实现。还介绍了对期望最大化PCA(EM PCA)等其他方法的影响。根据我们的结果,根据问题规模和可用计算资源,给出了PCA方法有效应用的一般建议。MPOWIT和这里讨论的所有其他方法都已在开源GIFT软件中实现并随时可用。