Suppr超能文献

PBOOST:一种基于 GPU 的全基因组关联研究中并行置换检验工具。

PBOOST: a GPU-based tool for parallel permutation tests in genome-wide association studies.

机构信息

Laboratory of Bioinformatics and Computational Biology, Department of Electronic and Computer Engineering and Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China.

出版信息

Bioinformatics. 2015 May 1;31(9):1460-2. doi: 10.1093/bioinformatics/btu840. Epub 2014 Dec 21.

Abstract

MOTIVATION

The importance of testing associations allowing for interactions has been demonstrated by Marchini et al. (2005). A fast method detecting associations allowing for interactions has been proposed by Wan et al. (2010a). The method is based on likelihood ratio test with the assumption that the statistic follows the χ(2) distribution. Many single nucleotide polymorphism (SNP) pairs with significant associations allowing for interactions have been detected using their method. However, the assumption of χ(2) test requires the expected values in each cell of the contingency table to be at least five. This assumption is violated in some identified SNP pairs. In this case, likelihood ratio test may not be applicable any more. Permutation test is an ideal approach to checking the P-values calculated in likelihood ratio test because of its non-parametric nature. The P-values of SNP pairs having significant associations with disease are always extremely small. Thus, we need a huge number of permutations to achieve correspondingly high resolution for the P-values. In order to investigate whether the P-values from likelihood ratio tests are reliable, a fast permutation tool to accomplish large number of permutations is desirable.

RESULTS

We developed a permutation tool named PBOOST. It is based on GPU with highly reliable P-value estimation. By using simulation data, we found that the P-values from likelihood ratio tests will have relative error of >100% when 50% cells in the contingency table have expected count less than five or when there is zero expected count in any of the contingency table cells. In terms of speed, PBOOST completed 10(7) permutations for a single SNP pair from the Wellcome Trust Case Control Consortium (WTCCC) genome data (Wellcome Trust Case Control Consortium, 2007) within 1 min on a single Nvidia Tesla M2090 device, while it took 60 min in a single CPU Intel Xeon E5-2650 to finish the same task. More importantly, when simultaneously testing 256 SNP pairs for 10(7) permutations, our tool took only 5 min, while the CPU program took 10 h. By permuting on a GPU cluster consisting of 40 nodes, we completed 10(12) permutations for all 280 SNP pairs reported with P-values smaller than 1.6 × 10⁻¹² in the WTCCC datasets in 1 week.

AVAILABILITY AND IMPLEMENTATION

The source code and sample data are available at http://bioinformatics.ust.hk/PBOOST.zip.

CONTACT

gyang@ust.hk; eeyu@ust.hk

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

Marchini 等人(2005 年)已经证明了测试允许交互作用的关联的重要性。Wan 等人(2010a)提出了一种快速检测允许交互作用的关联的方法。该方法基于似然比检验,假设统计量服从 χ(2)分布。使用他们的方法已经检测到了许多具有显著交互作用的单核苷酸多态性(SNP)对。然而,χ(2)检验的假设要求列联表中每个单元格的期望数值至少为五。在一些已识别的 SNP 对中,违反了这一假设。在这种情况下,似然比检验可能不再适用。由于其非参数性质,置换检验是检查似然比检验中计算的 P 值的理想方法。与疾病有显著关联的 SNP 对的 P 值总是非常小。因此,我们需要进行大量的置换来实现 P 值的相应高分辨率。为了研究似然比检验的 P 值是否可靠,需要一种快速的置换工具来完成大量的置换。

结果

我们开发了一个名为 PBOOST 的置换工具。它基于 GPU,具有高度可靠的 P 值估计。通过使用模拟数据,我们发现当列联表中 50%的单元格的期望计数小于五或任何列联表单元格的期望计数为零时,似然比检验的 P 值将有大于 100%的相对误差。在速度方面,PBOOST 在单个 Nvidia Tesla M2090 设备上,在单个 CPU Intel Xeon E5-2650 上完成相同任务的 60 分钟内,从惠康信托基金病例对照研究(Wellcome Trust Case Control Consortium,2007)的全基因组数据中为单个 SNP 对完成了 10(7)次置换,而在 1 分钟内完成了 10(7)次置换。更重要的是,当同时对 256 个 SNP 对进行 10(7)次置换时,我们的工具只需要 5 分钟,而 CPU 程序则需要 10 小时。通过在由 40 个节点组成的 GPU 集群上进行置换,我们在 1 周内完成了惠康信托基金数据集(WTCCC)中所有 280 个报告的 P 值小于 1.6×10⁻¹²的 SNP 对的 10(12)次置换。

可用性和实现

源代码和示例数据可在 http://bioinformatics.ust.hk/PBOOST.zip 获得。

联系人

gyang@ust.hkeeyu@ust.hk

补充信息

补充数据可在生物信息学在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验