NUS Graduate School for Integrative Sciences and Engineering, Department of Computer Science, School of Computing, National University of Singapore and Data Mining Department, Institute for Infocomm Research, Singapore.
Bioinformatics. 2011 Nov 1;27(21):2936-43. doi: 10.1093/bioinformatics/btr512. Epub 2011 Sep 7.
Many new methods have recently been proposed for detecting epistatic interactions in GWAS data. There is, however, no in-depth independent comparison of these methods yet.
Five recent methods-TEAM, BOOST, SNPHarvester, SNPRuler and Screen and Clean (SC)-are evaluated here in terms of power, type-1 error rate, scalability and completeness. In terms of power, TEAM performs best on data with main effect and BOOST performs best on data without main effect. In terms of type-1 error rate, TEAM and BOOST have higher type-1 error rates than SNPRuler and SNPHarvester. SC does not control type-1 error rate well. In terms of scalability, we tested the five methods using a dataset with 100 000 SNPs on a 64 bit Ubuntu system, with Intel (R) Xeon(R) CPU 2.66 GHz, 16 GB memory. TEAM takes ~36 days to finish and SNPRuler reports heap allocation problems. BOOST scales up to 100 000 SNPs and the cost is much lower than that of TEAM. SC and SNPHarvester are the most scalable. In terms of completeness, we study how frequently the pruning techniques employed by these methods incorrectly prune away the most significant epistatic interactions. We find that, on average, 20% of datasets without main effect and 60% of datasets with main effect are pruned incorrectly by BOOST, SNPRuler and SNPHarvester.
The software for the five methods tested are available from the URLs below. TEAM: http://csbio.unc.edu/epistasis/download.php BOOST: http://ihome.ust.hk/~eeyang/papers.html. SNPHarvester: http://bioinformatics.ust.hk/SNPHarvester.html. SNPRuler: http://bioinformatics.ust.hk/SNPRuler.zip. Screen and Clean: http://wpicr.wpic.pitt.edu/WPICCompGen/.
最近提出了许多新方法来检测 GWAS 数据中的上位相互作用。然而,这些方法还没有进行深入的独立比较。
本文评估了最近提出的五种方法(TEAM、BOOST、SNPHarvester、SNPRuler 和 Screen and Clean(SC))的功效、第一类错误率、可扩展性和完整性。在功效方面,TEAM 在具有主效应的数据上表现最好,而 BOOST 在没有主效应的数据上表现最好。在第一类错误率方面,TEAM 和 BOOST 的第一类错误率高于 SNPRuler 和 SNPHarvester。SC 不能很好地控制第一类错误率。在可扩展性方面,我们在一个包含 10 万个 SNP 的数据集上,在 64 位 Ubuntu 系统上、使用具有 16GB 内存的 Intel(R)Xeon(R)CPU 2.66GHz 进行了五种方法的测试。TEAM 大约需要 36 天才能完成,而 SNPRuler 报告堆分配问题。BOOST 可以扩展到 10 万个 SNP,成本远低于 TEAM。SC 和 SNPHarvester 的可扩展性最强。在完整性方面,我们研究了这些方法使用的修剪技术错误地修剪掉最重要的上位相互作用的频率。我们发现,在没有主效应的数据集平均有 20%,在有主效应的数据集平均有 60%被 BOOST、SNPRuler 和 SNPHarvester 错误地修剪掉。
以下是测试的五种方法的软件链接。TEAM:http://csbio.unc.edu/epistasis/download.php;BOOST:http://ihome.ust.hk/~eeyang/papers.html;SNPHarvester:http://bioinformatics.ust.hk/SNPHarvester.html;SNPRuler:http://bioinformatics.ust.hk/SNPRuler.zip;Screen and Clean:http://wpicr.wpic.pitt.edu/WPICCompGen/。