Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Bioinformatics. 2018 Jul 15;34(14):2341-2348. doi: 10.1093/bioinformatics/bty094.
For many traits, causal loci uncovered by genetic mapping studies explain only a minority of the heritable contribution to trait variation. Multiple explanations for this 'missing heritability' have been proposed. Single nucleotide polymorphism (SNP)-SNP interaction (epistasis), as one of the compelling models, has been widely studied. However, the genome-wide scan of epistasis, especially for quantitative traits, poses huge computational challenges. Moreover, covariate adjustment is largely ignored in epistasis analysis due to the massive extra computational undertaking.
In the current study, we found striking differences among epistasis models using both simulation data and real biological data, suggesting that not only can covariate adjustment remove confounding bias, it can also improve power. Furthermore, we derived mathematical formulas, which enable the exhaustive epistasis scan together with full covariate adjustment to be expressed in terms of large matrix operation, therefore substantially improving the computational efficiency (∼104× faster than existing methods). We call the new method MatrixEpistasis. With MatrixEpistasis, we re-analyze a large real yeast dataset comprising 11 623 SNPs, 1008 segregants and 46 quantitative traits with covariates fully adjusted and detect thousands of novel putative epistasis with P-values < 1.48e-10.
The method is implemented in R and available at https://github.com/fanglab/MatrixEpistasis.
Supplementary data are available at Bioinformatics online.
对于许多性状,遗传图谱研究发现的因果位点仅能解释性状变异中可遗传贡献的一小部分。对于这种“遗传缺失”提出了多种解释。单核苷酸多态性 (SNP)-SNP 相互作用(上位性)作为一种引人注目的模型,已经得到了广泛的研究。然而,全基因组上位性扫描,特别是对于数量性状,带来了巨大的计算挑战。此外,由于计算量巨大,协变量调整在多数情况下被忽略。
在当前研究中,我们使用模拟数据和真实生物学数据发现了上位性模型之间的显著差异,表明协变量调整不仅可以消除混杂偏差,还可以提高功效。此外,我们推导出了数学公式,这些公式使得全基因组扫描和完全的协变量调整可以通过大型矩阵运算来表示,从而大大提高了计算效率(比现有方法快 104 倍)。我们将新方法称为 MatrixEpistasis。使用 MatrixEpistasis,我们重新分析了一个包含 11623 个 SNP、1008 个分离群体和 46 个具有充分协变量调整的数量性状的大型真实酵母数据集,检测到数千个具有 P 值 < 1.48e-10 的新上位性。
该方法已在 R 中实现,可在 https://github.com/fanglab/MatrixEpistasis 上获取。
补充数据可在 Bioinformatics 在线获取。