Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA.
Genet Epidemiol. 2013 Jul;37(5):478-94. doi: 10.1002/gepi.21728. Epub 2013 May 5.
For analyzing complex trait association with sequencing data, most current studies test aggregated effects of variants in a gene or genomic region. Although gene-based tests have insufficient power even for moderately sized samples, pathway-based analyses combine information across multiple genes in biological pathways and may offer additional insight. However, most existing pathway association methods are originally designed for genome-wide association studies, and are not comprehensively evaluated for sequencing data. Moreover, region-based rare variant association methods, although potentially applicable to pathway-based analysis by extending their region definition to gene sets, have never been rigorously tested. In the context of exome-based studies, we use simulated and real datasets to evaluate pathway-based association tests. Our simulation strategy adopts a genome-wide genetic model that distributes total genetic effects hierarchically into pathways, genes, and individual variants, allowing the evaluation of pathway-based methods with realistic quantifiable assumptions on the underlying genetic architectures. The results show that, although no single pathway-based association method offers superior performance in all simulated scenarios, a modification of Gene Set Enrichment Analysis approach using statistics from single-marker tests without gene-level collapsing (weighted Kolmogrov-Smirnov [WKS]-Variant method) is consistently powerful. Interestingly, directly applying rare variant association tests (e.g., sequence kernel association test) to pathway analysis offers a similar power, but its results are sensitive to assumptions of genetic architecture. We applied pathway association analysis to an exome-sequencing data of the chronic obstructive pulmonary disease, and found that the WKS-Variant method confirms associated genes previously published.
为了分析与测序数据相关的复杂性状,大多数当前的研究都测试了基因或基因组区域中变体的聚合效应。尽管基于基因的测试即使对于中等大小的样本也没有足够的效力,但基于途径的分析结合了生物途径中多个基因的信息,并且可能提供更多的见解。然而,大多数现有的途径关联方法最初是为全基因组关联研究设计的,并没有针对测序数据进行全面评估。此外,基于区域的稀有变异关联方法,尽管通过将其区域定义扩展到基因集,可以潜在地应用于基于途径的分析,但从未经过严格测试。在基于外显子组的研究中,我们使用模拟和真实数据集来评估基于途径的关联测试。我们的模拟策略采用了全基因组遗传模型,该模型将总遗传效应分层分配到途径、基因和个体变体中,允许根据潜在遗传结构进行基于途径的方法的现实可量化假设进行评估。结果表明,尽管没有一种基于途径的关联方法在所有模拟场景中都具有优越的性能,但使用来自单标记测试的统计信息而不进行基因级合并的基因集富集分析方法的修改(加权 Kolmogrov-Smirnov [WKS]-Variant 方法)始终具有强大的功能。有趣的是,直接将稀有变异关联测试(例如,序列核关联测试)应用于途径分析提供了类似的效力,但结果对遗传结构的假设敏感。我们将途径关联分析应用于慢性阻塞性肺疾病的外显子组测序数据,发现 WKS-Variant 方法证实了先前发表的相关基因。