Zhang Min, Lin Yanzhu, Wang Libo, Pungpapong Vitara, Fleet James C, Zhang Dabao
Department of Statistics, Purdue University, 150 North University Street, West Lafayette, IN 47907, USA.
BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S17. doi: 10.1186/1753-6561-3-s7-s17.
Currently, genome-wide association studies (GWAS) are conducted by collecting a massive number of SNPs (i.e., large p) for a relatively small number of individuals (i.e., small n) and associations are made between clinical phenotypes and genetic variation one single-nucleotide polymorphism (SNP) at a time. Univariate association approaches like this ignore the linkage disequilibrium between SNPs in regions of low recombination. This results in a low reliability of candidate gene identification. Here we propose to improve the case-control GWAS approach by implementing linear discriminant analysis (LDA) through a penalized orthogonal-components regression (POCRE), a newly developed variable selection method for large p small n data. The proposed POCRE-LDA method was applied to the Genetic Analysis Workshop 16 case-control data for rheumatoid arthritis (RA). In addition to the two regions on chromosomes 6 and 9 previously associated with RA by GWAS, we identified SNPs on chromosomes 10 and 18 as potential candidates for further investigation.
目前,全基因组关联研究(GWAS)是通过为相对较少数量的个体(即小n)收集大量单核苷酸多态性(SNP,即大p)来进行的,并且临床表型与遗传变异之间的关联是一次针对一个单核苷酸多态性(SNP)进行的。像这样的单变量关联方法忽略了低重组区域中SNP之间的连锁不平衡。这导致候选基因鉴定的可靠性较低。在此,我们建议通过惩罚正交分量回归(POCRE)实施线性判别分析(LDA)来改进病例对照GWAS方法,POCRE是一种新开发的用于大p小n数据的变量选择方法。所提出的POCRE-LDA方法应用于遗传分析研讨会16的类风湿性关节炎(RA)病例对照数据。除了先前通过GWAS与RA相关的6号和9号染色体上的两个区域外,我们还将10号和18号染色体上的SNP鉴定为有待进一步研究的潜在候选基因。