Oblong Lennart M, Soheili-Nezhad Sourena, Trevisan Nicolò, Shi Yingjie, Beckmann Christian F, Sprooten Emma
Department of Cognitive Neuroscience, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Centre, Nijmegen, The Netherlands.
Language and Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
Genes Brain Behav. 2024 Jan 15;23(1):e12876. doi: 10.1111/gbb.12876.
The highly polygenic and pleiotropic nature of behavioural traits, psychiatric disorders and structural and functional brain phenotypes complicate mechanistic interpretation of related genome-wide association study (GWAS) signals, thereby obscuring underlying causal biological processes. We propose genomic principal and independent component analysis (PCA, ICA) to decompose a large set of univariate GWAS statistics of multimodal brain traits into more interpretable latent genomic components. Here we introduce and evaluate this novel methods various analytic parameters and reproducibility across independent samples. Two UK Biobank GWAS summary statistic releases of 2240 imaging-derived phenotypes (IDPs) were retrieved. Genome-wide beta-values and their corresponding standard-error scaled z-values were decomposed using genomic PCA/ICA. We evaluated variance explained at multiple dimensions up to 200. We tested the inter-sample reproducibility of output of dimensions 5, 10, 25 and 50. Reproducibility statistics of the respective univariate GWAS served as benchmarks. Reproducibility of 10-dimensional PCs and ICs showed the best trade-off between model complexity and robustness and variance explained (PCs: |r - max| = 0.33, |r - max| = 0.30; ICs: |r - max| = 0.23, |r - max| = 0.19). Genomic PC and IC reproducibility improved substantially relative to mean univariate GWAS reproducibility up to dimension 10. Genomic components clustered along neuroimaging modalities. Our results indicate that genomic PCA and ICA decompose genetic effects on IDPs from GWAS statistics with high reproducibility by taking advantage of the inherent pleiotropic patterns. These findings encourage further applications of genomic PCA and ICA as fully data-driven methods to effectively reduce the dimensionality, enhance the signal to noise ratio and improve interpretability of high-dimensional multitrait genome-wide analyses.
行为特征、精神疾病以及大脑结构和功能表型具有高度多基因和多效性的本质,这使得对相关全基因组关联研究(GWAS)信号的机制解释变得复杂,从而模糊了潜在的因果生物学过程。我们提出基因组主成分分析和独立成分分析(PCA,ICA),以将大量多模态大脑特征的单变量GWAS统计数据分解为更具可解释性的潜在基因组成分。在此,我们介绍并评估了这种新方法的各种分析参数以及在独立样本中的可重复性。我们检索了英国生物银行两次发布的2240种影像学衍生表型(IDP)的GWAS汇总统计数据。使用基因组PCA/ICA对全基因组β值及其相应的标准误差缩放z值进行分解。我们评估了高达200维的多个维度上所解释的方差。我们测试了第5、10、25和50维度输出的样本间可重复性。各个单变量GWAS的可重复性统计数据用作基准。10维主成分和独立成分的可重复性在模型复杂性、稳健性和所解释的方差之间显示出最佳平衡(主成分:|r - max| = 0.33,|r - max| = 0.30;独立成分:|r - max| = 0.23,|r - max| = 0.19)。相对于单变量GWAS的平均可重复性,基因组主成分和独立成分的可重复性在维度10之前有显著提高。基因组成分沿着神经影像学模态聚类。我们的结果表明,基因组PCA和ICA通过利用内在的多效性模式,以高可重复性从GWAS统计数据中分解对IDP的遗传效应。这些发现鼓励进一步将基因组PCA和ICA作为完全数据驱动的方法加以应用,以有效降低维度、提高信噪比并改善高维多特征全基因组分析的可解释性。