Li Qizhai, Zhang Hong, Yu Kai
Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
Hum Hered. 2010;69(4):219-28. doi: 10.1159/000291927. Epub 2010 Mar 24.
Most current genetic association studies, including genome-wide association studies, look for the single nucleotide polymorphisms (SNPs) with a relatively large minor allele frequency (MAF) (e.g. >5%) in the search for genetic loci underlying the susceptibility for complex diseases. The strategy of focusing on common SNPs in genetic association studies is very effective under the common-disease-common-variant (CDCV) hypothesis, which claims that common diseases are caused by common variants that have relatively small to moderate effects. Although the CDCV hypothesis has become the dogma guiding the conduct of association studies over the past decade, growing evidence from recent empirical data and simulations suggests that the causal genetic polymorphisms, including SNPs and copy number variants (CNVs), for common diseases have a wide spectrum of MAFs, ranging from rare to common. Unlike the analysis for common genetic variants, statistical approaches for the analysis of rare variants receive very little attention. Methods developed for common variants usually rely on their asymptotic properties, which can be inaccurate for the study of the rare variants with limited sample size. Although Fisher's exact test can be used for such a scenario, it is usually conservative and thus its usefulness is diminished to some extent. Here we propose two novel approaches for the analysis of rare genetic variants. Simulation studies and two real examples demonstrate the advantages of the proposed methods over the existing methods.
目前大多数基因关联研究,包括全基因组关联研究,在寻找复杂疾病易感性相关的基因座时,会寻找具有相对较大的次要等位基因频率(MAF)(例如>5%)的单核苷酸多态性(SNP)。在常见疾病-常见变异(CDCV)假说下,基因关联研究中关注常见SNP的策略非常有效,该假说认为常见疾病是由具有相对较小到中等效应的常见变异引起的。尽管在过去十年中,CDCV假说已成为指导关联研究的教条,但最近的实证数据和模拟研究越来越多的证据表明,常见疾病的因果基因多态性,包括SNP和拷贝数变异(CNV),具有广泛的MAF范围,从罕见到常见。与常见基因变异的分析不同,罕见变异分析的统计方法很少受到关注。为常见变异开发的方法通常依赖于它们的渐近性质,对于样本量有限的罕见变异研究来说,这些性质可能不准确。虽然费舍尔精确检验可用于这种情况,但它通常较为保守,因此其有用性在一定程度上有所降低。在此,我们提出两种用于分析罕见基因变异的新方法。模拟研究和两个实际例子证明了所提出的方法相对于现有方法的优势。