Lee Sungyoung, Choi Sungkyoung, Kim Young Jin, Kim Bong-Jo, Hwang Heungsun, Park Taesung
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-747, Korea.
Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Chungcheongbuk-Do 363-951, Korea.
Bioinformatics. 2016 Sep 1;32(17):i586-i594. doi: 10.1093/bioinformatics/btw425.
To address 'missing heritability' issue, many statistical methods for pathway-based analyses using rare variants have been proposed to analyze pathways individually. However, neglecting correlations between multiple pathways can result in misleading solutions, and pathway-based analyses of large-scale genetic datasets require massive computational burden. We propose a Pathway-based approach using HierArchical components of collapsed RAre variants Of High-throughput sequencing data (PHARAOH) for the analysis of rare variants by constructing a single hierarchical model that consists of collapsed gene-level summaries and pathways and analyzes entire pathways simultaneously by imposing ridge-type penalties on both gene and pathway coefficient estimates; hence our method considers the correlation of pathways without constraint by a multiple testing problem.
Through simulation studies, the proposed method was shown to have higher statistical power than the existing pathway-based methods. In addition, our method was applied to the large-scale whole-exome sequencing data with levels of a liver enzyme using two well-known pathway databases Biocarta and KEGG. This application demonstrated that our method not only identified associated pathways but also successfully detected biologically plausible pathways for a phenotype of interest. These findings were successfully replicated by an independent large-scale exome chip study.
An implementation of PHARAOH is available at http://statgen.snu.ac.kr/software/pharaoh/
Supplementary data are available at Bioinformatics online.
为了解决“遗传力缺失”问题,人们提出了许多基于通路分析的统计方法,利用罕见变异单独分析各个通路。然而,忽略多个通路之间的相关性可能会导致误导性的解决方案,并且基于通路分析大规模遗传数据集需要巨大的计算负担。我们提出了一种基于通路的方法,即利用高通量测序数据中塌缩罕见变异的层次成分(PHARAOH)来分析罕见变异,通过构建一个由塌缩基因水平汇总和通路组成的单层次模型,并对基因和通路系数估计施加岭型惩罚来同时分析整个通路;因此,我们的方法考虑了通路的相关性,而不受多重检验问题的限制。
通过模拟研究,结果表明所提出的方法比现有的基于通路的方法具有更高的统计效力。此外,我们的方法应用于使用两个著名通路数据库Biocarta和KEGG的具有肝酶水平的大规模全外显子测序数据。该应用表明,我们的方法不仅识别出相关通路,而且成功检测出了与感兴趣表型相关的生物学上合理的通路。这些发现通过独立的大规模外显子芯片研究成功得到了重复验证。
PHARAOH的实现可在http://statgen.snu.ac.kr/software/pharaoh/获取。
补充数据可在《生物信息学》在线获取。