Department of Epidemiology and Biostatistics, Case School of Medicine, Cleveland, Ohio, United States of America.
PLoS One. 2012;7(5):e35235. doi: 10.1371/journal.pone.0035235. Epub 2012 May 9.
We investigated the ability of several principal components analysis (PCA)-based strategies to detect and control for population stratification using data from a multi-center study of epithelial ovarian cancer among women of European-American ethnicity. These include a correction based on an ancestry informative markers (AIMs) panel designed to capture European ancestral variation and corrections utilizing un-thinned genome-wide SNP data; case-control samples were drawn from four geographically distinct North-American sites. The AIMs-only and genome-wide first principal components (PC1) both corresponded to the previously described North or Northwest-Southeast axis of European variation. We found that the genome-wide PCA captured this primary dimension of variation more precisely and identified additional axes of genome-wide variation of relevance to epithelial ovarian cancer. Associations evident between the genome-wide PCs and study site corroborate North American immigration history and suggest that undiscovered dimensions of variation lie within Northern Europe. The structure captured by the genome-wide PCA was also found within control individuals and did not reflect the case-control variation present in the data. The genome-wide PCA highlighted three regions of local LD, corresponding to the lactase (LCT) gene on chromosome 2, the human leukocyte antigen system (HLA) on chromosome 6 and to a common inversion polymorphism on chromosome 8. These features did not compromise the efficacy of PCs from this analysis for ancestry control. This study concludes that although AIMs panels are a cost-effective way of capturing population structure, genome-wide data should preferably be used when available.
我们利用一项欧洲裔美国女性上皮性卵巢癌的多中心研究数据,研究了几种基于主成分分析(PCA)的策略检测和控制群体分层的能力。这些策略包括基于祖先信息标记(AIMs)面板的校正,该面板旨在捕获欧洲祖先变异,以及利用未变薄的全基因组 SNP 数据进行校正;病例对照样本取自四个地理位置不同的北美站点。仅使用 AIMs 和全基因组第一主成分(PC1)都对应于先前描述的欧洲变异的北或西北-东南轴。我们发现,全基因组 PCA 更准确地捕捉到了这种主要的变异维度,并确定了与上皮性卵巢癌相关的全基因组变异的其他维度。全基因组 PC 之间的关联与研究地点一致,证实了北美移民史,并表明在北欧仍存在未被发现的变异维度。全基因组 PCA 所捕获的结构也存在于对照个体中,并不反映数据中存在的病例对照变异。全基因组 PCA 突出了三个局部 LD 区域,对应于染色体 2 上的乳糖酶(LCT)基因、染色体 6 上的人类白细胞抗原系统(HLA)以及染色体 8 上的常见倒位多态性。这些特征并没有影响该分析中用于祖先控制的 PCs 的功效。本研究得出结论,尽管 AIMs 面板是捕获群体结构的一种具有成本效益的方法,但在可用时应优先使用全基因组数据。