Tandy School of Computer Science, University of Tulsa, Tulsa, Oklahoma 74104, USA.
Genet Epidemiol. 2013 Sep;37(6):614-21. doi: 10.1002/gepi.21739. Epub 2013 Jun 5.
Open source tools are needed to facilitate the construction, analysis, and visualization of gene-gene interaction networks for sequencing data. To address this need, we present Encore, an open source network analysis pipeline for genome-wide association studies and rare variant data. Encore constructs Genetic Association Interaction Networks or epistasis networks using two optional approaches: our previous information-theory method or a generalized linear model approach. Additionally, Encore includes multiple data filtering options, including Random Forest/Random Jungle for main effect enrichment and Evaporative Cooling and Relief-F filters for enrichment of interaction effects. Encore implements SNPrank network centrality for identifying susceptibility hubs (nodes containing a large amount of disease susceptibility information through the combination of multivariate main effects and multiple gene-gene interactions in the network), and it provides appropriate files for interactive visualization of a network using tools from our online Galaxy instance. We implemented these algorithms in C++ using OpenMP for shared-memory parallel analysis on a server or desktop. To demonstrate Encore's utility in analysis of genetic sequencing data, we present an analysis of exome resequencing data from healthy individuals and those with Systemic Lupus Erythematous (SLE). Our results verify the importance of the previously associated SLE genes HLA-DRB and NCF2, and these two genes had the highest gene-gene interaction degrees among the susceptibility hubs. An additional 14 genes previously associated with SLE emerged in our epistasis network model of the exome data, and three novel candidate genes, ST8SIA4, CMTM4, and C2CD4B, were implicated in the model. In summary, we present a comprehensive tool for epistasis network analysis and the first such analysis of exome data from a genetic study of SLE.
开源工具对于构建、分析和可视化测序数据的基因-基因相互作用网络是必要的。为了解决这一需求,我们提出了 Encore,这是一个用于全基因组关联研究和罕见变异数据的开源网络分析管道。Encore 使用两种可选方法构建遗传关联相互作用网络或上位性网络:我们之前的信息论方法或广义线性模型方法。此外,Encore 包括多个数据过滤选项,包括随机森林/随机丛林进行主效应富集,以及 Evaporative Cooling 和 Relief-F 过滤器进行相互作用效应的富集。Encore 实现了 SNPrank 网络中心性,用于识别易感性枢纽(通过网络中多元主效应和多个基因-基因相互作用的组合包含大量疾病易感性信息的节点),并提供了适当的文件,用于使用我们的在线 Galaxy 实例中的工具交互式可视化网络。我们使用 C++和 OpenMP 实现了这些算法,用于服务器或桌面的共享内存并行分析。为了展示 Encore 在遗传测序数据分析中的实用性,我们展示了对健康个体和系统性红斑狼疮(SLE)患者外显子重测序数据的分析。我们的结果验证了先前与 SLE 相关的基因 HLA-DRB 和 NCF2 的重要性,并且这两个基因在易感性枢纽中具有最高的基因-基因相互作用程度。在我们对外显子数据的上位性网络模型的分析中,还出现了另外 14 个先前与 SLE 相关的基因,并且三个新的候选基因 ST8SIA4、CMTM4 和 C2CD4B 被纳入了该模型。总之,我们提出了一种全面的上位性网络分析工具,以及第一个对 SLE 遗传研究的外显子数据进行的此类分析。