Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, USA.
Genomics. 2011 Jul;98(1):1-8. doi: 10.1016/j.ygeno.2011.04.006. Epub 2011 Apr 30.
Recent studies have demonstrated that gene set analysis, which tests disease association with genetic variants in a group of functionally related genes, is a promising approach for analyzing and interpreting genome-wide association studies (GWAS) data. These approaches aim to increase power by combining association signals from multiple genes in the same gene set. In addition, gene set analysis can also shed more light on the biological processes underlying complex diseases. However, current approaches for gene set analysis are still in an early stage of development in that analysis results are often prone to sources of bias, including gene set size and gene length, linkage disequilibrium patterns and the presence of overlapping genes. In this paper, we provide an in-depth review of the gene set analysis procedures, along with parameter choices and the particular methodology challenges at each stage. In addition to providing a survey of recently developed tools, we also classify the analysis methods into larger categories and discuss their strengths and limitations. In the last section, we outline several important areas for improving the analytical strategies in gene set analysis.
最近的研究表明,基因集分析是一种很有前途的方法,它可以通过检测一组功能相关基因中的遗传变异与疾病的关联,来分析和解释全基因组关联研究 (GWAS) 数据。这些方法旨在通过整合同一基因集中多个基因的关联信号来提高效力。此外,基因集分析还可以更深入地了解复杂疾病背后的生物学过程。然而,目前的基因集分析方法仍处于早期发展阶段,因为分析结果往往容易受到多种因素的影响,包括基因集大小和基因长度、连锁不平衡模式以及重叠基因的存在。本文深入综述了基因集分析的过程,以及在每个阶段的参数选择和特定方法学挑战。除了对最近开发的工具进行综述外,我们还将分析方法分为更大的类别,并讨论它们的优缺点。最后一节概述了改进基因集分析中分析策略的几个重要领域。