Abegaz Fentaw, Van Lishout François, Mahachie John Jestinah M, Chiachoompu Kridsadakorn, Bhardwaj Archana, Duroux Diane, Gusareva Elena S, Wei Zhi, Hakonarson Hakon, Van Steen Kristel
GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium.
Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA.
BioData Min. 2021 Feb 19;14(1):16. doi: 10.1186/s13040-021-00247-w.
In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework.
To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC.
Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power.
We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity.
在全基因组关联研究中,由于群体结构导致的混杂因素的程度和影响已得到充分认识。对这种混杂因素处理不当可能会导致虚假关联,妨碍重复验证以及因果变异的识别。已经开发了几种策略来保护关联不受混杂因素影响,最流行的一种基于主成分分析。相比之下,在基因-基因相互作用关联上位性研究中,由于群体结构导致的混杂因素的程度和影响则较少被研究和理解。特别是,非线性遗传群体亚结构在上位性检测中的作用在很大程度上未得到充分研究,尤其是在回归框架之外。
为了识别协同作用中的因果变异,提高上位性结果的可解释性和可重复性,我们基于一种针对结构化群体的基于模型的多因素降维方法引入了三种策略,即MBMDR-PC、MBMDR-PG和MBMDR-GC。
比较各种方法性能的模拟结果表明,在存在群体结构的情况下,MBMDR-PC和MBMDR-PG在名义水平上始终比MBMDR-GC更好地控制I型错误率。此外,我们提出的三种群体结构校正方法在统计功效方面优于MDR-SP。
我们通过广泛的模拟研究证明了不同程度的遗传群体结构和相关性对上位性检测的影响,并基于线性和非线性样本遗传相似性提出了适当的补救措施。