Grellmann Claudia, Bitzer Sebastian, Neumann Jane, Westlye Lars T, Andreassen Ole A, Villringer Arno, Horstmann Annette
Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1A, 04103 Leipzig, Germany; Leipzig University Hospital, IFB Adiposity Diseases, Philipp-Rosenthal-Straße 27, 04103 Leipzig, Germany.
Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1A, 04103 Leipzig, Germany.
Neuroimage. 2015 Feb 15;107:289-310. doi: 10.1016/j.neuroimage.2014.12.025. Epub 2014 Dec 17.
The standard analysis approach in neuroimaging genetics studies is the mass-univariate linear modeling (MULM) approach. From a statistical view, however, this approach is disadvantageous, as it is computationally intensive, cannot account for complex multivariate relationships, and has to be corrected for multiple testing. In contrast, multivariate methods offer the opportunity to include combined information from multiple variants to discover meaningful associations between genetic and brain imaging data. We assessed three multivariate techniques, partial least squares correlation (PLSC), sparse canonical correlation analysis (sparse CCA) and Bayesian inter-battery factor analysis (Bayesian IBFA), with respect to their ability to detect multivariate genotype-phenotype associations. Our goal was to systematically compare these three approaches with respect to their performance and to assess their suitability for high-dimensional and multi-collinearly dependent data as is the case in neuroimaging genetics studies. In a series of simulations using both linearly independent and multi-collinear data, we show that sparse CCA and PLSC are suitable even for very high-dimensional collinear imaging data sets. Among those two, the predictive power was higher for sparse CCA when voxel numbers were below 400 times sample size and candidate SNPs were considered. Accordingly, we recommend Sparse CCA for candidate phenotype, candidate SNP studies. When voxel numbers exceeded 500 times sample size, the predictive power was the highest for PLSC. Therefore, PLSC can be considered a promising technique for multivariate modeling of high-dimensional brain-SNP-associations. In contrast, Bayesian IBFA cannot be recommended, since additional post-processing steps were necessary to detect causal relations. To verify the applicability of sparse CCA and PLSC, we applied them to an experimental imaging genetics data set provided for us. Most importantly, application of both methods replicated the findings of this data set.
神经影像遗传学研究中的标准分析方法是单变量线性建模(MULM)方法。然而,从统计学角度来看,这种方法存在劣势,因为它计算量很大,无法考虑复杂的多变量关系,并且必须针对多重检验进行校正。相比之下,多变量方法提供了整合多个变异体的综合信息以发现遗传数据与脑成像数据之间有意义关联的机会。我们评估了三种多变量技术,即偏最小二乘相关分析(PLSC)、稀疏典型相关分析(稀疏CCA)和贝叶斯电池间因子分析(贝叶斯IBFA),考察它们检测多变量基因型-表型关联的能力。我们的目标是系统比较这三种方法的性能,并评估它们对神经影像遗传学研究中出现的高维和多重共线性相关数据的适用性。在一系列使用线性独立数据和多重共线数据的模拟中,我们表明稀疏CCA和PLSC即使对于非常高维的共线成像数据集也适用。在这两种方法中,当体素数量低于样本量的400倍且考虑候选单核苷酸多态性(SNP)时,稀疏CCA的预测能力更高。因此,对于候选表型、候选SNP研究,我们推荐使用稀疏CCA。当体素数量超过样本量的500倍时,PLSC的预测能力最高。因此,PLSC可被视为一种用于高维脑-SNP关联多变量建模的有前景的技术。相比之下,贝叶斯IBFA不推荐使用,因为需要额外的后处理步骤来检测因果关系。为了验证稀疏CCA和PLSC的适用性,我们将它们应用于为我们提供的一个实验性影像遗传学数据集。最重要的是,这两种方法的应用都重复了该数据集的研究结果。