Ueki Masao, Kawasaki Yoshinori, Tamiya Gen
Biostatistics Center, Kurume University, Fukuoka, Japan.
Department of Statistical Modeling, The Institute of Statistical Mathematics, The Graduate University for Advanced Studies, Tachikawa, Tokyo, Japan.
Genet Epidemiol. 2017 Sep;41(6):481-497. doi: 10.1002/gepi.22051. Epub 2017 Jun 19.
Genome-wide association studies (GWASs) commonly use marginal association tests for each single-nucleotide polymorphism (SNP). Because these tests treat SNPs as independent, their power will be suboptimal for detecting SNPs hidden by linkage disequilibrium (LD). One way to improve power is to use a multiple regression model. However, the large number of SNPs preclude simultaneous fitting with multiple regression, and subset regression is infeasible because of an exorbitant number of candidate subsets. We therefore propose a new method for detecting hidden SNPs having significant yet weak marginal association in a multiple regression model. Our method begins by constructing a bidirected graph locally around each SNP that demonstrates a moderately sized marginal association signal, the focal SNPs. Vertexes correspond to SNPs, and adjacency between vertexes is defined by an LD measure. Subsequently, the method collects from each graph all shortest paths to the focal SNP. Finally, for each shortest path the method fits a multiple regression model to all the SNPs lying in the path and tests the significance of the regression coefficient corresponding to the terminal SNP in the path. Simulation studies show that the proposed method can detect susceptibility SNPs hidden by LD that go undetected with marginal association testing or with existing multivariate methods. When applied to real GWAS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), our method detected two groups of SNPs: one in a region containing the apolipoprotein E (APOE) gene, and another in a region close to the semaphorin 5A (SEMA5A) gene.
全基因组关联研究(GWAS)通常对每个单核苷酸多态性(SNP)使用边际关联检验。由于这些检验将SNP视为独立的,因此在检测因连锁不平衡(LD)而隐藏的SNP时,其功效将不理想。提高功效的一种方法是使用多元回归模型。然而,大量的SNP使得无法同时用多元回归进行拟合,并且由于候选子集数量过多,子集回归也不可行。因此,我们提出了一种新方法,用于在多元回归模型中检测具有显著但微弱边际关联的隐藏SNP。我们的方法首先在每个显示中等大小边际关联信号的SNP(即焦点SNP)周围局部构建一个双向图。顶点对应于SNP,顶点之间的邻接关系由LD度量定义。随后,该方法从每个图中收集到焦点SNP的所有最短路径。最后,对于每条最短路径,该方法对路径中所有的SNP拟合一个多元回归模型,并检验与路径中终端SNP对应的回归系数的显著性。模拟研究表明,所提出的方法可以检测到因LD而隐藏的、边际关联检验或现有多变量方法未检测到的易感SNP。当应用于来自阿尔茨海默病神经影像学倡议(ADNI)的真实GWAS数据时,我们的方法检测到两组SNP:一组在包含载脂蛋白E(APOE)基因的区域,另一组在靠近信号素5A(SEMA5A)基因的区域。