Analytic and Translational Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, MA 02114, USA.
Bioinformatics. 2012 Jul 1;28(13):1797-9. doi: 10.1093/bioinformatics/bts191. Epub 2012 Apr 17.
Here we present INRICH (INterval enRICHment analysis), a pathway-based genome-wide association analysis tool that tests for enriched association signals of predefined gene-sets across independent genomic intervals. INRICH has wide applicability, fast running time and, most importantly, robustness to potential genomic biases and confounding factors. Such factors, including varying gene size and single-nucleotide polymorphism density, linkage disequilibrium within and between genes and overlapping genes with similar annotations, are often not accounted for by existing gene-set enrichment methods. By using a genomic permutation procedure, we generate experiment-wide empirical significance values, corrected for the total number of sets tested, implicitly taking overlap of sets into account. By simulation we confirm a properly controlled type I error rate and reasonable power of INRICH under diverse parameter settings. As a proof of principle, we describe the application of INRICH on the NHGRI GWAS catalog.
A standalone C++ program, user manual and datasets can be freely downloaded from: http://atgu.mgh.harvard.edu/inrich/.
本文提出了 INRICH(区间富集分析),这是一种基于通路的全基因组关联分析工具,用于测试预定义基因集在独立基因组区间中的富集关联信号。INRICH 具有广泛的适用性、快速的运行时间,最重要的是,对潜在的基因组偏差和混杂因素具有稳健性。这些因素包括基因大小和单核苷酸多态性密度的变化、基因内和基因间的连锁不平衡以及具有相似注释的重叠基因等,这些因素通常无法通过现有的基因集富集方法来解释。通过使用基因组置换程序,我们生成了实验范围的经验显著性值,针对测试的总集数进行了校正,隐含地考虑了集的重叠。通过模拟,我们在不同的参数设置下确认了 INRICH 具有适当控制的Ⅰ型错误率和合理的功效。作为原理验证,我们描述了 INRICH 在 NHGRI GWAS 目录中的应用。
一个独立的 C++程序、用户手册和数据集可以从以下网址免费下载:http://atgu.mgh.harvard.edu/inrich/。