Epidemiology, Biostatistics and Biodemography, Department of Public Health, University of Southern Denmark, Odense, Denmark.
Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao, China.
Aging (Albany NY). 2020 Nov 24;12(22):22457-22494. doi: 10.18632/aging.104198.
Despite a strong genetic background in cognitive function only a limited number of single nucleotide polymorphisms (SNPs) have been found in genome-wide association studies (GWASs). We hypothesize that this is partially due to mis-specified modeling concerning phenotype distribution as well as the relationship between SNP dosage and the level of the phenotype. To overcome these issues, we introduced an assumption-free method based on generalized correlation coefficient (GCC) in a GWAS of cognitive function in Danish and Chinese twins to compare its performance with traditional linear models. The GCC-based GWAS identified two significant SNPs in Danish samples (rs71419535, p 1.47e-08; rs905838, p = 1.69e-08) and two significant SNPs in Chinese samples (rs2292999, p = 9.27e-10; rs17019635, p = 2.50e-09). In contrast, linear models failed to detect any genome-wide significant SNPs. The number of top significant genes overlapping between the two samples in the GCC-based GWAS was higher than when applying linear models. The GCC model identified significant genetic variants missed by conventional linear models, with more replicated genes and biological pathways related to cognitive function. Moreover, the GCC-based GWAS was robust in handling correlated samples like twin pairs. GCC is a useful statistical method for GWAS that complements traditional linear models for capturing genetic effects beyond the additive assumption.
尽管认知功能具有很强的遗传背景,但在全基因组关联研究(GWAS)中仅发现了有限数量的单核苷酸多态性(SNP)。我们假设这部分是由于表型分布以及 SNP 剂量与表型水平之间的关系的模型指定不当。为了解决这些问题,我们在丹麦和中国双胞胎的认知功能 GWAS 中引入了一种基于广义相关系数(GCC)的无假设方法,以比较其与传统线性模型的性能。基于 GCC 的 GWAS 在丹麦样本中确定了两个显著的 SNP(rs71419535,p 1.47e-08;rs905838,p = 1.69e-08),在中国样本中确定了两个显著的 SNP(rs2292999,p = 9.27e-10;rs17019635,p = 2.50e-09)。相比之下,线性模型未能检测到任何全基因组显著的 SNP。在基于 GCC 的 GWAS 中,两个样本之间重叠的最重要基因数量高于应用线性模型时。GCC 模型确定了传统线性模型错过的显著遗传变异,具有更多复制的基因和与认知功能相关的生物学途径。此外,基于 GCC 的 GWAS 在处理相关样本(如双胞胎)时具有稳健性。GCC 是一种用于 GWAS 的有用统计方法,它补充了传统的线性模型,用于捕获超出加性假设的遗传效应。