Zhao Huaqing, Mitra Nandita, Kanetsky Peter A, Nathanson Katherine L, Rebbeck Timothy R
Department of Clinical Sciences, Temple University School of Medicine, 3440 N. Broad Street, Kresge Hall East, Room 218, Philadelphia, PA 19140, USA, Phone: 215-707-6139, Fax: 215-707-3160.
Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA 19104, USA.
Stat Appl Genet Mol Biol. 2018 Dec 4;17(6):/j/sagmb.2018.17.issue-6/sagmb-2017-0054/sagmb-2017-0054.xml. doi: 10.1515/sagmb-2017-0054.
Genome-wide association studies (GWAS) are susceptible to bias due to population stratification (PS). The most widely used method to correct bias due to PS is principal components (PCs) analysis (PCA), but there is no objective method to guide which PCs to include as covariates. Often, the ten PCs with the highest eigenvalues are included to adjust for PS. This selection is arbitrary, and patterns of local linkage disequilibrium may affect PCA corrections. To address these limitations, we estimate genomic propensity scores based on all statistically significant PCs selected by the Tracy-Widom (TW) statistic. We compare a principal components and propensity scores (PCAPS) approach to PCA and EMMAX using simulated GWAS data under no, moderate, and severe PS. PCAPS reduced spurious genetic associations regardless of the degree of PS, resulting in odds ratio (OR) estimates closer to the true OR. We illustrate our PCAPS method using GWAS data from a study of testicular germ cell tumors. PCAPS provided a more conservative adjustment than PCA. Advantages of the PCAPS approach include reduction of bias compared to PCA, consistent selection of propensity scores to adjust for PS, the potential ability to handle outliers, and ease of implementation using existing software packages.
全基因组关联研究(GWAS)易受群体分层(PS)导致的偏差影响。校正PS所致偏差最常用的方法是主成分(PC)分析(PCA),但尚无客观方法来指导将哪些PC作为协变量纳入。通常,会纳入具有最高特征值的十个PC来校正PS。这种选择是任意的,局部连锁不平衡模式可能会影响PCA校正。为解决这些局限性,我们基于通过特蕾西 - 威多姆(TW)统计量选择的所有统计显著的PC来估计基因组倾向得分。我们在无、中度和重度PS条件下,使用模拟的GWAS数据,将主成分与倾向得分(PCAPS)方法与PCA和EMMAX进行比较。无论PS程度如何,PCAPS都能减少虚假遗传关联,使优势比(OR)估计值更接近真实OR。我们使用睾丸生殖细胞肿瘤研究的GWAS数据展示了我们的PCAPS方法。PCAPS提供了比PCA更保守的调整。PCAPS方法的优点包括与PCA相比减少偏差、一致选择倾向得分来校正PS、处理异常值的潜在能力以及使用现有软件包易于实施。