Sabourin Jeremy, Nobel Andrew B, Valdar William
Department of Genetics, University of North Carolina at Chapel Hill, North Carolina, United States of America; Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, North Carolina, United States of America.
Genet Epidemiol. 2015 Feb;39(2):77-88. doi: 10.1002/gepi.21869. Epub 2014 Nov 21.
Genomewide association studies (GWAS) sometimes identify loci at which both the number and identities of the underlying causal variants are ambiguous. In such cases, statistical methods that model effects of multiple single-nucleotide polymorphisms (SNPs) simultaneously can help disentangle the observed patterns of association and provide information about how those SNPs could be prioritized for follow-up studies. Current multi-SNP methods, however, tend to assume that SNP effects are well captured by additive genetics; yet when genetic dominance is present, this assumption translates to reduced power and faulty prioritizations. We describe a statistical procedure for prioritizing SNPs at GWAS loci that efficiently models both additive and dominance effects. Our method, LLARRMA-dawg, combines a group LASSO procedure for sparse modeling of multiple SNP effects with a resampling procedure based on fractional observation weights. It estimates for each SNP the robustness of association with the phenotype both to sampling variation and to competing explanations from other SNPs. In producing an SNP prioritization that best identifies underlying true signals, we show the following: our method easily outperforms a single-marker analysis; when additive-only signals are present, our joint model for additive and dominance is equivalent to or only slightly less powerful than modeling additive-only effects; and when dominance signals are present, even in combination with substantial additive effects, our joint model is unequivocally more powerful than a model assuming additivity. We also describe how performance can be improved through calibrated randomized penalization, and discuss how dominance in ungenotyped SNPs can be incorporated through either heterozygote dosage or multiple imputation.
全基因组关联研究(GWAS)有时会识别出一些基因座,在这些基因座上,潜在因果变异的数量和身份都不明确。在这种情况下,同时对多个单核苷酸多态性(SNP)效应进行建模的统计方法有助于理清观察到的关联模式,并提供有关如何对这些SNP进行后续研究优先级排序的信息。然而,当前的多SNP方法往往假定SNP效应可以通过加性遗传学很好地捕捉;然而,当存在遗传显性时,这种假设会导致功效降低和错误的优先级排序。我们描述了一种在GWAS基因座上对SNP进行优先级排序的统计程序,该程序能有效地对加性和显性效应进行建模。我们的方法LLARRMA-dawg,将用于多个SNP效应稀疏建模的组套索程序与基于分数观察权重的重采样程序相结合。它为每个SNP估计与表型关联对抽样变异和来自其他SNP的竞争性解释的稳健性。在生成能最佳识别潜在真实信号的SNP优先级排序时,我们展示了以下内容:我们的方法轻松优于单标记分析;当仅存在加性信号时,我们的加性和显性联合模型等同于仅建模加性效应的模型,或者功效仅略低;当存在显性信号时,即使与大量加性效应相结合,我们的联合模型也明显比假定加性的模型更具功效。我们还描述了如何通过校准随机惩罚来提高性能,并讨论了如何通过杂合子剂量或多重填补将未分型SNP中的显性纳入其中。