Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge, UK.
BMC Bioinformatics. 2013 Jun 6;14:177. doi: 10.1186/1471-2105-14-177.
The development of genotyping arrays containing hundreds of thousands of rare variants across the genome and advances in high-throughput sequencing technologies have made feasible empirical genetic association studies to search for rare disease susceptibility alleles. As single variant testing is underpowered to detect associations, the development of statistical methods to combine analysis across variants - so-called "burden tests" - is an area of active research interest. We previously developed a method, the admixture maximum likelihood test, to test multiple, common variants for association with a trait of interest. We have extended this method, called the rare admixture maximum likelihood test (RAML), for the analysis of rare variants. In this paper we compare the performance of RAML with six other burden tests designed to test for association of rare variants.
We used simulation testing over a range of scenarios to test the power of RAML compared to the other rare variant association testing methods. These scenarios modelled differences in effect variability, the average direction of effect and the proportion of associated variants. We evaluated the power for all the different scenarios. RAML tended to have the greatest power for most scenarios where the proportion of associated variants was small, whereas SKAT-O performed a little better for the scenarios with a higher proportion of associated variants.
The RAML method makes no assumptions about the proportion of variants that are associated with the phenotype of interest or the magnitude and direction of their effect. The method is flexible and can be applied to both dichotomous and quantitative traits and allows for the inclusion of covariates in the underlying regression model. The RAML method performed well compared to the other methods over a wide range of scenarios. Generally power was moderate in most of the scenarios, underlying the need for large sample sizes in any form of association testing.
基因组中包含数十万种罕见变异的基因分型阵列的发展和高通量测序技术的进步,使得对罕见疾病易感等位基因进行实证遗传关联研究成为可能。由于单变异测试不足以检测关联,因此开发了一种统计方法来组合跨变异的分析——所谓的“负担测试”——这是一个活跃的研究兴趣领域。我们之前开发了一种方法,即混合最大似然检验,用于检测与感兴趣的性状相关的多个常见变体的关联。我们已经扩展了这种方法,称为罕见混合最大似然检验(RAML),用于分析罕见变体。在本文中,我们将 RAML 的性能与其他六种旨在测试罕见变体关联的负担测试方法进行了比较。
我们在一系列场景中进行了模拟测试,以测试 RAML 与其他罕见变体关联测试方法相比的功效。这些场景模拟了效应变异性、平均效应方向和相关变体比例的差异。我们评估了所有不同场景的功效。对于关联变体比例较小的大多数场景,RAML 往往具有最大的功效,而对于关联变体比例较高的场景,SKAT-O 的表现稍好。
RAML 方法对与感兴趣表型相关的变体比例或其效应的大小和方向没有任何假设。该方法具有灵活性,可应用于二项和定量性状,并允许在基础回归模型中包含协变量。在广泛的场景中,RAML 方法与其他方法相比表现良好。一般来说,在大多数情况下,功效适中,这表明在任何形式的关联测试中都需要大的样本量。