Luo Yiwen, Maity Arnab, Wu Michael C, Smith Chris, Duan Qing, Li Yun, Tzeng Jung-Ying
Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America.
Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America.
Genet Epidemiol. 2018 Apr;42(3):276-287. doi: 10.1002/gepi.22102. Epub 2017 Dec 26.
Recent studies showed that population substructure (PS) can have more complex impact on rare variant tests and that similarity-based collapsing tests (e.g., SKAT) may suffer more severely by PS than burden-based tests. In this work, we evaluate the performance of SKAT coupling with principal components (PC) or variance components (VC) based PS correction methods. We consider confounding effects caused by PS including stratified populations, admixed populations, and spatially distributed nongenetic risk; we investigate which types of variants (e.g., common, less frequent, rare, or all variants) should be used to effectively control for confounding effects. We found that (i) PC-based methods can account for confounding effects in most scenarios except for admixture, although the number of sufficient PCs depends on the PS complexity and the type of variants used. (ii) PCs based on all variants (i.e., common + less frequent + rare) tend to require equal or fewer sufficient PCs and often achieve higher power than PCs based on other variant types. (iii) VC-based methods can effectively adjust for confounding in all scenarios (even for admixture), though the type of variants should be used to construct VC may vary. (iv) VC based on all variants works consistently in all scenarios, though its power may be sometimes lower than VC based on other variant types. Given that the best-performed method and which variants to use depend on the underlying unknown confounding mechanisms, a robust strategy is to perform SKAT analyses using VC-based methods based on all variants.
最近的研究表明,群体亚结构(PS)对罕见变异检测可能产生更为复杂的影响,并且基于相似性的合并检验(例如,SKAT)相比基于负担的检验可能更容易受到PS的影响。在这项工作中,我们评估了将SKAT与基于主成分(PC)或方差成分(VC)的PS校正方法相结合的性能。我们考虑了由PS引起的混杂效应,包括分层群体、混合群体和空间分布的非遗传风险;我们研究了应该使用哪些类型的变异(例如,常见变异、低频变异、罕见变异或所有变异)来有效控制混杂效应。我们发现:(i)基于PC的方法在大多数情况下(除了混合情况)都可以解释混杂效应,尽管足够的PC数量取决于PS的复杂性和所使用的变异类型。(ii)基于所有变异(即常见变异 + 低频变异 + 罕见变异)的PC往往需要相等或更少数量的足够PC,并且通常比基于其他变异类型的PC具有更高的检验效能。(iii)基于VC的方法在所有情况下(甚至对于混合情况)都可以有效地调整混杂效应,尽管用于构建VC的变异类型可能会有所不同。(iv)基于所有变异的VC在所有情况下都能持续发挥作用,尽管其检验效能有时可能低于基于其他变异类型的VC。鉴于表现最佳的方法以及使用哪些变异取决于潜在的未知混杂机制,一种稳健的策略是使用基于所有变异的基于VC的方法进行SKAT分析。