Machine Learning and Computational Biology Research Group, Max Planck Institute for Developmental Biology & Max Planck Institute for Intelligent Systems Spemannstr 38, 72076 Tübingen, Germany.
Bioinformatics. 2013 Jul 1;29(13):i171-9. doi: 10.1093/bioinformatics/btt238.
As an increasing number of genome-wide association studies reveal the limitations of the attempt to explain phenotypic heritability by single genetic loci, there is a recent focus on associating complex phenotypes with sets of genetic loci. Although several methods for multi-locus mapping have been proposed, it is often unclear how to relate the detected loci to the growing knowledge about gene pathways and networks. The few methods that take biological pathways or networks into account are either restricted to investigating a limited number of predetermined sets of loci or do not scale to genome-wide settings.
We present SConES, a new efficient method to discover sets of genetic loci that are maximally associated with a phenotype while being connected in an underlying network. Our approach is based on a minimum cut reformulation of the problem of selecting features under sparsity and connectivity constraints, which can be solved exactly and rapidly. SConES outperforms state-of-the-art competitors in terms of runtime, scales to hundreds of thousands of genetic loci and exhibits higher power in detecting causal SNPs in simulation studies than other methods. On flowering time phenotypes and genotypes from Arabidopsis thaliana, SConES detects loci that enable accurate phenotype prediction and that are supported by the literature.
Code is available at http://webdav.tuebingen.mpg.de/u/karsten/Forschung/scones/.
Supplementary data are available at Bioinformatics online.
随着越来越多的全基因组关联研究揭示了试图仅通过单个遗传位点来解释表型遗传率的局限性,最近人们越来越关注将复杂表型与一组遗传位点相关联。尽管已经提出了几种多基因座映射方法,但如何将检测到的基因座与关于基因途径和网络的不断增长的知识联系起来,这一点往往并不清楚。少数考虑生物途径或网络的方法要么仅限于研究有限数量的预定基因座集,要么无法扩展到全基因组范围。
我们提出了 SConES,这是一种新的有效方法,可以发现与表型最大相关且在基础网络中相互连接的一组遗传基因座。我们的方法基于对稀疏和连通性约束下选择特征的问题的最小切割重新表述,可以精确且快速地解决。SConES 在运行时间方面优于最先进的竞争对手,可扩展到数十万遗传基因座,并且在模拟研究中检测因果 SNP 的功效高于其他方法。在拟南芥的开花时间表型和基因型上,SConES 检测到了可以进行准确表型预测的基因座,并且这些基因座得到了文献的支持。
代码可在 http://webdav.tuebingen.mpg.de/u/karsten/Forschung/scones/ 获得。
补充数据可在 Bioinformatics 在线获得。