He Dan, Parida Laxmi
IBM T.J Watson Research, Yorktown Heights, NY, USA,
Pac Symp Biocomput. 2017;22:426-437. doi: 10.1142/9789813207813_0040.
Quantitative genetic trait prediction based on high-density genotyping arrays plays an important role for plant and animal breeding, as well as genetic epidemiology such as complex diseases. The prediction can be very helpful to develop breeding strategies and is crucial to translate the findings in genetics to precision medicine. Epistasis, the phenomena where the SNPs interact with each other, has been studied extensively in Genome Wide Association Studies (GWAS) but received relatively less attention for quantitative genetic trait prediction. As the number of possible interactions is generally extremely large, even pairwise interactions is very challenging. To our knowledge, there is no solid solution yet to utilize epistasis to improve genetic trait prediction. In this work, we studied the multi-locus epistasis problem where the interactions with more than two SNPs are considered. We developed an effcient algorithm MUSE to improve the genetic trait prediction with the help of multi-locus epistasis. MUSE is sampling-based and we proposed a few different sampling strategies. Our experiments on real data showed that MUSE is not only effcient but also effective to improve the genetic trait prediction. MUSE also achieved very significant improvements on a real plant data set as well as a real human data set.
基于高密度基因分型阵列的数量遗传性状预测在动植物育种以及诸如复杂疾病等遗传流行病学中发挥着重要作用。这种预测对于制定育种策略非常有帮助,并且对于将遗传学研究结果转化为精准医学至关重要。上位性,即单核苷酸多态性(SNP)之间相互作用的现象,在全基因组关联研究(GWAS)中已得到广泛研究,但在数量遗传性状预测方面受到的关注相对较少。由于可能的相互作用数量通常极大,即使是成对相互作用也极具挑战性。据我们所知,目前尚无利用上位性来改进遗传性状预测的可靠解决方案。在这项工作中,我们研究了多基因座上位性问题,即考虑两个以上SNP之间的相互作用。我们开发了一种高效算法MUSE,借助多基因座上位性来改进遗传性状预测。MUSE基于采样,我们提出了几种不同的采样策略。我们在真实数据上的实验表明,MUSE不仅高效,而且在改进遗传性状预测方面也很有效。MUSE在一个真实植物数据集以及一个真实人类数据集上也取得了非常显著的改进。