Suppr超能文献

全基因组关联研究中调整群体分层方法的评估:标准主成分分析与分类主成分分析

Evaluation of methods for adjusting population stratification in genome-wide association studies: Standard versus categorical principal component analysis.

作者信息

Turkmen Asuman S, Yuan Yuan, Billor Nedret

机构信息

Department of Statistics, The Ohio State University, Newark, Ohio.

Department of Mathematics & Statistics, Auburn University, Auburn, Alabama.

出版信息

Ann Hum Genet. 2019 Nov;83(6):454-464. doi: 10.1111/ahg.12339. Epub 2019 Jul 19.

Abstract

Unaccounted population stratification can lead to false-positive findings and can mask the true association signals in identification of disease-related genetic variants. The computational simplicity of principal component analysis (PCA) makes it a widely used method for population stratification adjustment. However, given that genotype data are generally represented by numerical values 0, 1, and 2, corresponding to the number of minor alleles, it is more reasonable to consider genotype data as categorical data. Because PCA is inherently only suitable for continuous variables, it is not appropriate to directly apply PCA on genotype data. Second, although common variants have been extensively studied, little is known about the stratification of rare variants and its impact on association tests. Over the last decade, there has been a shift in the genome-wide association studies toward studying low-frequency (minor allele frequency [MAF] between 0.01 and 0.05) and rare (MAF less than 0.01) variants, which are now widely reputed as complex trait determinants. The fact that rare variants are not stratified in the same way as common variants necessitates the development of statistical methods that can capture stratification patterns for low-frequency and rare variants. To address these limitations, we investigate performances of generalized PCA and similarity-matrix-based PCA methods to detect underlying structures for rare and common variants. We demonstrate, through simulated and real datasets, that a special case of generalized PCA (i.e., logistic PCA) is able to adjust for population stratification in rare variants much more effectively than standard PCA while their performances are comparable for common variants.

摘要

未考虑到的人群分层可能导致假阳性结果,并可能在识别疾病相关基因变异时掩盖真正的关联信号。主成分分析(PCA)计算简单,使其成为广泛用于人群分层调整的方法。然而,鉴于基因型数据通常由对应于次要等位基因数量的数值0、1和2表示,将基因型数据视为分类数据更为合理。由于PCA本质上仅适用于连续变量,直接将PCA应用于基因型数据是不合适的。其次,尽管常见变异已得到广泛研究,但对于罕见变异的分层及其对关联检验的影响却知之甚少。在过去十年中,全基因组关联研究已转向研究低频(次要等位基因频率[MAF]在0.01至0.05之间)和罕见(MAF小于0.01)变异,这些变异现在被广泛认为是复杂性状的决定因素。罕见变异与常见变异的分层方式不同,这一事实使得有必要开发能够捕捉低频和罕见变异分层模式的统计方法。为了解决这些局限性,我们研究了广义PCA和基于相似性矩阵的PCA方法在检测罕见和常见变异潜在结构方面的性能。我们通过模拟数据集和真实数据集证明,广义PCA的一种特殊情况(即逻辑PCA)在调整罕见变异的人群分层方面比标准PCA更有效,而它们在常见变异方面的性能相当。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验