Grellmann Claudia, Neumann Jane, Bitzer Sebastian, Kovacs Peter, Tönjes Anke, Westlye Lars T, Andreassen Ole A, Stumvoll Michael, Villringer Arno, Horstmann Annette
Department of Neurology, Max Planck Institute for Human Cognitive and Brain SciencesLeipzig, Germany; IFB Adiposity Diseases, Leipzig University Medical CenterLeipzig, Germany.
Department of Neurology, Max Planck Institute for Human Cognitive and Brain SciencesLeipzig, Germany; IFB Adiposity Diseases, Leipzig University Medical CenterLeipzig, Germany; Collaborative Research Center 1052-A5, University of LeipzigLeipzig, Germany.
Front Genet. 2016 Jun 7;7:102. doi: 10.3389/fgene.2016.00102. eCollection 2016.
In recent years, the advent of great technological advances has produced a wealth of very high-dimensional data, and combining high-dimensional information from multiple sources is becoming increasingly important in an extending range of scientific disciplines. Partial Least Squares Correlation (PLSC) is a frequently used method for multivariate multimodal data integration. It is, however, computationally expensive in applications involving large numbers of variables, as required, for example, in genetic neuroimaging. To handle high-dimensional problems, dimension reduction might be implemented as pre-processing step. We propose a new approach that incorporates Random Projection (RP) for dimensionality reduction into PLSC to efficiently solve high-dimensional multimodal problems like genotype-phenotype associations. We name our new method PLSC-RP. Using simulated and experimental data sets containing whole genome SNP measures as genotypes and whole brain neuroimaging measures as phenotypes, we demonstrate that PLSC-RP is drastically faster than traditional PLSC while providing statistically equivalent results. We also provide evidence that dimensionality reduction using RP is data type independent. Therefore, PLSC-RP opens up a wide range of possible applications. It can be used for any integrative analysis that combines information from multiple sources.
近年来,重大技术进步的出现产生了大量非常高维的数据,并且在越来越广泛的科学学科中,整合来自多个源的高维信息正变得越来越重要。偏最小二乘相关(PLSC)是一种常用于多变量多模态数据整合的方法。然而,在涉及大量变量的应用中,例如在基因神经成像中,它的计算成本很高。为了处理高维问题,可以将降维作为预处理步骤来实施。我们提出了一种新方法,该方法将用于降维的随机投影(RP)纳入PLSC,以有效解决诸如基因型 - 表型关联等高维多模态问题。我们将我们的新方法命名为PLSC - RP。使用包含全基因组SNP测量作为基因型和全脑神经成像测量作为表型的模拟和实验数据集,我们证明PLSC - RP比传统的PLSC快得多,同时提供统计上等效的结果。我们还提供证据表明使用RP进行降维与数据类型无关。因此,PLSC - RP开辟了广泛的可能应用。它可用于任何整合来自多个源信息的综合分析。