Laboratoire des Sciences du Numérique de Nantes (LS2N), Centre National de la recherche Scientifique UMR6004, University of Nantes, Nantes, France.
Institut du Thorax, Institut National de la Santé et de la Recherche Médicale UMR 1087, Centre National de la Recherche Scientifique UMR 6291, University of Nantes, Nantes, France.
Bioinformatics. 2018 Aug 15;34(16):2773-2780. doi: 10.1093/bioinformatics/bty154.
Large scale genome-wide association studies (GWAS) are tools of choice for discovering associations between genotypes and phenotypes. To date, many studies rely on univariate statistical tests for association between the phenotype and each assayed single nucleotide polymorphism (SNP). However, interaction between SNPs, namely epistasis, must be considered when tackling the complexity of underlying biological mechanisms. Epistasis analysis at large scale entails a prohibitive computational burden when addressing the detection of more than two interacting SNPs. In this paper, we introduce a stochastic causal graph-based method, SMMB, to analyze epistatic patterns in GWAS data.
We present Stochastic Multiple Markov Blanket algorithm (SMMB), which combines both ensemble stochastic strategy inspired from random forests and Bayesian Markov blanket-based methods. We compared SMMB with three other recent algorithms using both simulated and real datasets. Our method outperforms the other compared methods for a majority of simulated cases of 2-way and 3-way epistasis patterns (especially in scenarii where minor allele frequencies of causal SNPs are low). Our approach performs similarly as two other compared methods for large real datasets, in terms of power, and runs faster.
Parallel version available on https://ls2n.fr/listelogicielsequipe/DUKe/128/.
Supplementary data are available at Bioinformatics online.
大规模全基因组关联研究(GWAS)是发现基因型与表型之间关联的首选工具。迄今为止,许多研究依赖于单变量统计检验来研究表型与每个检测到的单核苷酸多态性(SNP)之间的关联。然而,当处理潜在生物学机制的复杂性时,必须考虑 SNP 之间的相互作用,即上位性。当涉及到检测两个以上相互作用的 SNP 时,大规模的上位性分析需要承担巨大的计算负担。在本文中,我们介绍了一种基于随机因果图的方法 SMMB,用于分析 GWAS 数据中的上位性模式。
我们提出了随机多重马尔可夫 blankets 算法(SMMB),它结合了随机森林启发的集成随机策略和基于贝叶斯马尔可夫 blankets 的方法。我们使用模拟数据集和真实数据集比较了 SMMB 与其他三种最近的算法。我们的方法在模拟的 2 路和 3 路上位性模式的大多数情况下(尤其是在因果 SNP 的次要等位基因频率较低的情况下)都优于其他三种比较方法。在处理大型真实数据集时,我们的方法在功效方面与另外两种比较方法相似,并且运行速度更快。
可在 https://ls2n.fr/listelogicielsequipe/DUKe/128/ 上获得并行版本。
补充数据可在生物信息学在线获得。