Department of Psychiatry and Behavioral Sciences, University of Texas, Health Science Center Houston, Houston, Texas, USA.
Big Data Science Institute, Department of Statistics, University of Oxford, Oxford, UK.
Hum Brain Mapp. 2024 Dec 1;45(17):e70044. doi: 10.1002/hbm.70044.
National and international biobanking efforts led to the collection of large and inclusive imaging genetics datasets that enable examination of the contribution of genetic and environmental factors to human brains in illness and health. High-resolution neuroimaging (~10 voxels) and genetic (10 single nucleotide polymorphic [SNP] variants) data are available in statistically powerful (N = 10) epidemiological and disorder-focused samples. Performing imaging genetics analyses at full resolution afforded in these datasets is a formidable computational task even under the assumption of unrelatedness among the subjects. The computational complexity rises as ~N (where N is the sample size), when accounting for relatedness among subjects. We describe fast, non-iterative simplifications to accelerate classical variance component (VC) methods including heritability, genetic correlation, and genome-wide association in dense and complex empirical pedigrees. These approaches linearize (from N to N) computational effort while maintaining fidelity (r ~ 0.95) with the VC results and take advantage of parallel computing provided by central and graphics processing units (CPU and GPU). We show that the new approaches lead to a 10- to 10-fold reduction in computational complexity-making voxel-wise heritability, genetic correlation, and genome-wide association studies (GWAS) analysis practical for large and complex samples such as those provided by the Amish and Human Connectome Projects (N = 406 and 1052 subjects, respectively) and UK Biobank (N = 31,681). These developments are shared in open-source, SOLAR-Eclipse software.
国家和国际生物库的努力导致了大量包容性成像遗传学数据集的收集,这些数据集使我们能够研究遗传和环境因素对疾病和健康状态下人类大脑的影响。高分辨率神经影像学(10 个体素)和遗传(10 个单核苷酸多态性[SNP]变体)数据可用于具有统计效力(N=10)的流行病学和疾病重点样本。即使在假设研究对象之间没有相关性的情况下,在这些数据集中以全分辨率进行成像遗传学分析也是一项艰巨的计算任务。当考虑到研究对象之间的相关性时,计算复杂度会上升到N(其中 N 是样本大小)。我们描述了快速、非迭代的简化方法,以加速经典方差分量(VC)方法,包括在密集和复杂的经验性家系中进行遗传性、遗传相关性和全基因组关联分析。这些方法将计算复杂度从 N 线性化到 N,同时保持 VC 结果的保真度(r~0.95),并利用中央和图形处理单元(CPU 和 GPU)提供的并行计算。我们表明,新方法将计算复杂度降低了 10 到 100 倍,使得体素遗传率、遗传相关性和全基因组关联研究(GWAS)分析对于大型复杂样本(如阿米什人和人类连接组计划(N=406 和 1052 名受试者)和英国生物库(N=31681))和 UK Biobank 变得切实可行。这些开发成果在开源的 SOLAR-Eclipse 软件中共享。