Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai 200230, China; Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, China; School of Bio-medical Engineering, Shanghai Jiao Tong University, Shanghai 200230, China.
Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University, Shanghai 200230, China; Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, China.
J Genet Genomics. 2015 Aug 20;42(8):445-53. doi: 10.1016/j.jgg.2015.06.007. Epub 2015 Jul 9.
Population stratification is a problem in genetic association studies because it is likely to highlight loci that underlie the population structure rather than disease-related loci. At present, principal component analysis (PCA) has been proven to be an effective way to correct for population stratification. However, the conventional PCA algorithm is time-consuming when dealing with large datasets. We developed a Graphic processing unit (GPU)-based PCA software named SHEsisPCA (http://analysis.bio-x.cn/SHEsisMain.htm) that is highly parallel with a highest speedup greater than 100 compared with its CPU version. A cluster algorithm based on X-means was also implemented as a way to detect population subgroups and to obtain matched cases and controls in order to reduce the genomic inflation and increase the power. A study of both simulated and real datasets showed that SHEsisPCA ran at an extremely high speed while the accuracy was hardly reduced. Therefore, SHEsisPCA can help correct for population stratification much more efficiently than the conventional CPU-based algorithms.
群体分层是遗传关联研究中的一个问题,因为它很可能突出显示构成群体结构的基因座,而不是与疾病相关的基因座。目前,主成分分析(PCA)已被证明是一种纠正群体分层的有效方法。然而,传统的 PCA 算法在处理大型数据集时耗时较长。我们开发了一种基于图形处理单元(GPU)的 PCA 软件,名为 SHEsisPCA(http://analysis.bio-x.cn/SHEsisMain.htm),它具有高度的并行性,与 CPU 版本相比,最高加速比大于 100。还实现了一种基于 X-means 的聚类算法,以检测群体亚群,并获得匹配的病例和对照,以减少基因组膨胀并提高功效。对模拟和真实数据集的研究表明,SHEsisPCA 的运行速度极快,而准确性几乎没有降低。因此,SHEsisPCA 可以帮助比传统的基于 CPU 的算法更有效地纠正群体分层。