Department of Biostatistics, University of Washington, Seattle, USA.
Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, USA.
Brief Bioinform. 2019 Jan 18;20(1):245-253. doi: 10.1093/bib/bbx107.
Genome-wide association studies have been an important approach used to localize trait loci, with primary focus on common variants. The multiple rare variant-common disease hypothesis may explain the missing heritability remaining after accounting for identified common variants. Advances of sequencing technologies with their decreasing costs, coupled with methodological advances in the context of association studies in large samples, now make the study of rare variants at a genome-wide scale feasible. The resurgence of family-based association designs because of their advantage in studying rare variants has also stimulated more methods development, mainly based on linear mixed models (LMMs). Other tests such as score tests can have advantages over the LMMs, but to date have mainly been proposed for single-marker association tests. In this article, we extend several score tests (χcorrected2, WQLS, and SKAT) to the multiple variant association framework. We evaluate and compare their statistical performances relative with the LMM. Moreover, we show that three tests can be cast as the difference between marker allele frequencies (AFs) estimated in each of the group of affected and unaffected subjects. We show that these tests are flexible, as they can be based on related, unrelated or both related and unrelated subjects. They also make feasible an increasingly common design that only sequences a subset of affected subjects (related or unrelated) and uses for comparison publicly available AFs estimated in a group of healthy subjects. Finally, we show the great impact of linkage disequilibrium on the performance of all these tests.
全基因组关联研究一直是定位性状基因座的重要方法,主要关注常见变体。多种罕见变异-常见疾病假说可以解释在考虑到已鉴定的常见变体后仍然存在的遗传缺失。测序技术的进步及其成本的降低,加上在大样本关联研究背景下方法的进步,现在使得在全基因组范围内研究罕见变体成为可能。由于其在研究罕见变体方面的优势,基于家系的关联设计的复兴也刺激了更多方法的发展,主要基于线性混合模型(LMM)。其他测试,如评分测试,相对于 LMM 可能具有优势,但迄今为止主要是针对单标记关联测试提出的。在本文中,我们将几种评分测试(χ校正 2、WQLS 和 SKAT)扩展到多变体关联框架中。我们评估并比较了它们与 LMM 的统计性能。此外,我们表明,这三个测试可以表示在受影响和未受影响的对象的每个组中估计的标记等位基因频率(AF)之间的差异。我们表明这些测试具有灵活性,因为它们可以基于相关、不相关或相关和不相关的主体。它们还使得一种越来越常见的设计成为可能,即仅对部分受影响的主体(相关或不相关)进行测序,并使用一组健康主体中估计的公共 AF 进行比较。最后,我们表明连锁不平衡对所有这些测试的性能都有很大影响。