Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, 66045, USA.
BMC Bioinformatics. 2010 Apr 29;11 Suppl 3(Suppl 3):S5. doi: 10.1186/1471-2105-11-S3-S5.
Detecting epistatic interactions associated with complex and common diseases can help to improve prevention, diagnosis and treatment of these diseases. With the development of genome-wide association studies (GWAS), designing powerful and robust computational method for identifying epistatic interactions associated with common diseases becomes a great challenge to bioinformatics society, because the study of epistatic interactions often deals with the large size of the genotyped data and the huge amount of combinations of all the possible genetic factors. Most existing computational detection methods are based on the classification capacity of SNP sets, which may fail to identify SNP sets that are strongly associated with the diseases and introduce a lot of false positives. In addition, most methods are not suitable for genome-wide scale studies due to their computational complexity.
We propose a new Markov Blanket-based method, DASSO-MB (Detection of ASSOciations using Markov Blanket) to detect epistatic interactions in case-control GWAS. Markov blanket of a target variable T can completely shield T from all other variables. Thus, we can guarantee that the SNP set detected by DASSO-MB has a strong association with diseases and contains fewest false positives. Furthermore, DASSO-MB uses a heuristic search strategy by calculating the association between variables to avoid the time-consuming training process as in other machine-learning methods. We apply our algorithm to simulated datasets and a real case-control dataset. We compare DASSO-MB to other commonly-used methods and show that our method significantly outperforms other methods and is capable of finding SNPs strongly associated with diseases.
Our study shows that DASSO-MB can identify a minimal set of causal SNPs associated with diseases, which contains less false positives compared to other existing methods. Given the huge size of genomic dataset produced by GWAS, this is critical in saving the potential costs of biological experiments and being an efficient guideline for pathogenesis research.
检测与复杂和常见疾病相关的上位相互作用可以帮助改善这些疾病的预防、诊断和治疗。随着全基因组关联研究(GWAS)的发展,设计用于识别与常见疾病相关的上位相互作用的强大而稳健的计算方法成为生物信息学领域的一大挑战,因为上位相互作用的研究通常涉及到基因分型数据的大规模和所有可能遗传因素的组合的大量。大多数现有的计算检测方法都是基于 SNP 集的分类能力,这可能无法识别与疾病强烈相关的 SNP 集,并引入大量的假阳性。此外,由于计算复杂性,大多数方法不适合全基因组规模的研究。
我们提出了一种新的基于马尔可夫 blankets 的方法,DASSO-MB(使用马尔可夫 blankets 检测关联),用于检测病例对照 GWAS 中的上位相互作用。目标变量 T 的马尔可夫 blankets 可以完全屏蔽 T 与所有其他变量的联系。因此,我们可以保证 DASSO-MB 检测到的 SNP 集与疾病有很强的关联,并且包含最少的假阳性。此外,DASSO-MB 通过计算变量之间的关联来使用启发式搜索策略,避免了像其他机器学习方法那样耗时的训练过程。我们将我们的算法应用于模拟数据集和真实的病例对照数据集。我们将 DASSO-MB 与其他常用方法进行比较,结果表明我们的方法显著优于其他方法,并且能够找到与疾病强烈相关的 SNPs。
我们的研究表明,DASSO-MB 可以识别与疾病相关的最小一组因果 SNPs,与其他现有方法相比,它包含的假阳性更少。鉴于 GWAS 产生的基因组数据集的巨大规模,这对于节省潜在的生物学实验成本和作为发病机制研究的有效指导方针至关重要。