Misawa Kazuharu, Kamatani Naoyuki
Research Program for Computational Science, Research and Development Group for Next-Generation Integrated Living Matter Simulation, and Fusion of Data and Analysis Research and Development Team, RIKEN, 4-6-1 Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan.
Source Code Biol Med. 2011 May 24;6(1):10. doi: 10.1186/1751-0473-6-10.
Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs). By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required.
We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo.
ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.
在全基因组关联研究(GWAS)中,使用缺失基因型推断和单倍型重建很有价值。通过对参考面板中的连锁不平衡模式进行建模,可以推断出研究样本中未直接测量的基因型,并将其用于GWAS。由于在GWAS中需要推断数百万个单核苷酸多态性,因此需要更快的基因型推断和单倍型重建方法。
我们开发了一个用于基因型推断和单倍型重建并行计算的程序包。我们的程序包ParaHaplo 3.0旨在用于使用英特尔消息传递接口的工作站集群。我们比较了ParaHaplo 3.0在日本东京的日本人、北京的汉族人和HapMap数据集中的中国人中的性能。ParaHaplo 3.0的并行版本进行基因型推断的速度比ParaHaplo的非并行版本快20倍。
ParaHaplo 3.0是进行基于单倍型的GWAS的宝贵工具。随着此类项目数据量的不断增加,使用并行计算更快地进行基因型推断和单倍型重建的需求将变得越来越重要。ParaHaplo可执行二进制文件和程序源代码可在http://en.sourceforge.jp/projects/parallelgwas/releases/获取。