Department of Biostatistics, School of Public Health, University of Alabama at Birmingham Birmingham, AL, USA.
Front Genet. 2012 Dec 21;3:301. doi: 10.3389/fgene.2012.00301. eCollection 2012.
Genome-wide association (GWA) studies have become a standard approach for discovering and validating genomic polymorphisms putatively associated with phenotypes of interest. Accounting for population structure in GWA studies is critical to attain unbiased parameter measurements and control Type I error. One common approach to accounting for population structure is to include several principal components derived from the entire autosomal dataset, which reflects population structure signal. However, knowing which components to include is subjective and generally not conclusive. We examined how phylogenetic signal from mitochondrial DNA (mtDNA) and chromosome Y (chr:Y) markers is concordant with principal component data based on autosomal markers to determine whether mtDNA and chr:Y phylogenetic data can help guide principal component selection. Using HAPMAP and other original data from individuals of multiple ancestries, we examined the relationships of mtDNA and chr:Y phylogenetic signal with the autosomal PCA using best subset logistic regression. We show that while the two approaches agree at times, this is independent of the component order and not completely represented in the Eigen values. Additionally, we use simulations to demonstrate that our approach leads to a slightly reduced Type I error rate compared to the standard approach. This approach provides preliminary evidence to support the theoretical concept that mtDNA and chr:Y data can be informative in locating the PCs that are most associated with evolutionary history of populations that are being studied, although the utility of such information will depend on the specific situation.
全基因组关联 (GWA) 研究已成为发现和验证与感兴趣表型相关的基因组多态性的标准方法。在 GWA 研究中考虑群体结构对于获得无偏参数测量和控制 I 型错误至关重要。一种常见的方法是包含几个从整个常染色体数据集得出的主成分,这些成分反映了群体结构信号。然而,知道要包含哪些成分是主观的,并且通常没有定论。我们检查了来自线粒体 DNA (mtDNA) 和染色体 Y (chr:Y) 标记的系统发育信号与基于常染色体标记的主成分数据的一致性,以确定 mtDNA 和 chr:Y 系统发育数据是否可以帮助指导主成分选择。使用 HAPMAP 和来自多个祖先个体的其他原始数据,我们使用最佳子集逻辑回归检查了 mtDNA 和 chr:Y 系统发育信号与常染色体 PCA 的关系。我们表明,虽然这两种方法有时是一致的,但这与组件顺序无关,并且不完全反映在特征值中。此外,我们使用模拟来证明与标准方法相比,我们的方法导致略微降低的 I 型错误率。这种方法提供了初步证据,支持了理论概念,即 mtDNA 和 chr:Y 数据可以提供有关正在研究的人群进化历史最相关的 PCs 的信息,尽管这种信息的实用性将取决于具体情况。