DE LA Vega Francisco M, Bustamante Carlos D, Leal Suzanne M
Life Technologies, Foster City, CA 94403, USA.
Pac Symp Biocomput. 2011:74-5. doi: 10.1142/9789814335058_0008.
Genome-wide associations studies (GWAS) have been very successful in identifying common genetic variation associated to numerous complex diseases [1]. However, most of the identified common genetic variants appear to confer modest risk and few causal alleles have been identified [2]. Furthermore, these associations account for a small portion of the total heritability of inherited disease variation [1]. This has led to the reexamination of the contribution of environment, gene-gene and gene-environment interactions, and rare genetic variants in complex diseases [1, 3, 4]. There is strong evidence that rare variants play an important role in complex disease etiology and may have larger genetic effects than common variants [2]. Currently, much of what we know regarding the contribution of rare genetic variants to disease risk is based on a limited number of phenotypes and candidate genes. However, rapid advancement of second generation sequencing technologies will invariably lead to widespread association studies comparing whole exome and eventually whole genome sequencing of cases and controls. A tremendous challenge for enabling these "next generation" medical genomic studies is developing statistical approaches for correlating rare genetic variants with disease outcome. The analysis of rare variants is challenging since methods used for common variants are woefully underpowered. Therefore, methods that can deal with genetic heterogeneity at the trait-associated locus have been developed to analyze rare variants. These methods instead analyzing individual variants analyze variants within a region/gene as a group and usually rely on collapsing. They can be applied to both in cases vs. controls and quantitative trait studies are needed. The paper of Bansal et al. in this volume describes the application of a number of statistical methods for testing associations between rare variants in two genes to obesity. The authors considered the relative merits of the different methods as well as important implementation details, such as the leveraging of genomic annotations and determining p-values. Knowledge of haplotypes can increase the power of GWAS studies and also highlight associations that are impossible to detect without haplotype phase (e.g. loss of heterozygosity). Even more complicated phase-dependent interactions of variants in linkage equilibrium have also been suggested as possible causes of missing heritability. In their work, Hallsorsson et al. formulate algorithmic strategies for haplotype phasing by multi-assembly of shared haplotypes from next-generation sequencing data. These methods would allow testing haplotypes harboring rare variants for association and potentially increase their explanatory power. Since single SNP tests are often underpowered in rare variant association analysis, Zeggini and Asimit propose a locus-based method that has high power in the presence of rare variants and that incorporate base quality scores available for sequencing data. Their results suggest that this multi-marker approach may be best suited for smaller regions, or after some filtering to reduce the number of SNPs that are jointly tested to reduce loss of power due to multiple-testing adjustments. Finally, the paper of Zhou et al., presents a penalized regression framework for association testing on sequence data, in the presence of both common and rare variants. This method also introduces the use of weights to incorporate available biological information on the variants. Although these tactics improve both false positive and false negative rates, they represent an incremental development and there is still significant room for improvement. With the development of sequencing technologies and methods to detect complex trait rare variant associations many new and exciting discovery are imminent. The analysis of rare variants is still in its infancy and the next few years promises to produce many new methods to meet the special demands of analyzing this type of data. Note from Publisher: This article contains the abstract and references.
全基因组关联研究(GWAS)在识别与众多复杂疾病相关的常见基因变异方面非常成功[1]。然而,大多数已识别的常见基因变异似乎只带来适度风险,且几乎没有发现因果等位基因[2]。此外,这些关联仅占遗传疾病变异总遗传力的一小部分[1]。这导致人们重新审视环境、基因-基因和基因-环境相互作用以及复杂疾病中罕见基因变异的作用[1,3,4]。有强有力的证据表明,罕见变异在复杂疾病病因中起重要作用,且可能比常见变异具有更大的遗传效应[2]。目前,我们对罕见基因变异对疾病风险贡献的了解大多基于有限数量的表型和候选基因。然而,第二代测序技术的快速发展必然会带来广泛的关联研究,比较病例组和对照组的全外显子组测序,最终是全基因组测序。开展这些“下一代”医学基因组研究面临的巨大挑战是开发将罕见基因变异与疾病结局相关联的统计方法。罕见变异的分析具有挑战性,因为用于常见变异的方法在检测罕见变异时能力严重不足。因此,已开发出能够处理性状相关位点遗传异质性的方法来分析罕见变异。这些方法不是分析单个变异,而是将一个区域/基因内的变异作为一个组进行分析,通常依赖于合并。它们可应用于病例对照研究以及需要进行数量性状研究的情况。本卷中Bansal等人的论文描述了多种统计方法在检测两个基因中的罕见变异与肥胖之间关联的应用。作者考虑了不同方法的相对优点以及重要的实施细节,如利用基因组注释和确定p值。单倍型知识可以提高GWAS研究的效力,还能突出那些没有单倍型相位就无法检测到的关联(例如杂合性缺失)。甚至有人提出,处于连锁平衡状态的变异之间更复杂的相位依赖相互作用可能是遗传力缺失的原因。在他们的工作中,Hallsorsson等人制定了通过对来自下一代测序数据的共享单倍型进行多重组装来进行单倍型分型的算法策略。这些方法将允许检测携带罕见变异的单倍型的关联性,并可能提高其解释力。由于在罕见变异关联分析中,单核苷酸多态性(SNP)单检验往往效力不足,Zeggini和Asimit提出了一种基于位点的方法,该方法在存在罕见变异时具有高效力,并且纳入了可用于测序数据的碱基质量得分。他们的结果表明,这种多标记方法可能最适合较小的区域,或者在经过一些筛选以减少共同检验的SNP数量,从而减少多重检验调整导致的效力损失之后使用。最后,Zhou等人的论文提出了一个用于在存在常见和罕见变异的情况下对序列数据进行关联检验的惩罚回归框架。该方法还引入了权重的使用,以纳入关于变异的可用生物学信息。尽管这些策略改善了假阳性和假阴性率,但它们只是渐进式发展,仍有很大的改进空间。随着测序技术和检测复杂性状罕见变异关联方法的发展,许多新的、令人兴奋的发现即将出现。罕见变异的分析仍处于起步阶段,未来几年有望产生许多新方法来满足分析此类数据的特殊需求。出版商注:本文包含摘要和参考文献。