Gui Shupeng, Rice Andrew P, Chen Rui, Wu Liang, Liu Ji, Miao Hongyu
Department of Computer Science, University of Rochester, Rochester, 14620, NY, USA.
Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, 77030, TX, USA.
BMC Bioinformatics. 2017 Jan 31;18(1):74. doi: 10.1186/s12859-017-1489-z.
Gene regulatory interactions are of fundamental importance to various biological functions and processes. However, only a few previous computational studies have claimed success in revealing genome-wide regulatory landscapes from temporal gene expression data, especially for complex eukaryotes like human. Moreover, recent work suggests that these methods still suffer from the curse of dimensionality if a network size increases to 100 or higher.
Here we present a novel scalable algorithm for identifying genome-wide gene regulatory network (GRN) structures, and we have verified the algorithm performances by extensive simulation studies based on the DREAM challenge benchmark data. The highlight of our method is that its superior performance does not degenerate even for a network size on the order of 10, and is thus readily applicable to large-scale complex networks. Such a breakthrough is achieved by considering both prior biological knowledge and multiple topological properties (i.e., sparsity and hub gene structure) of complex networks in the regularized formulation. We also validate and illustrate the application of our algorithm in practice using the time-course gene expression data from a study on human respiratory epithelial cells in response to influenza A virus (IAV) infection, as well as the CHIP-seq data from ENCODE on transcription factor (TF) and target gene interactions. An interesting finding, owing to the proposed algorithm, is that the biggest hub structures (e.g., top ten) in the GRN all center at some transcription factors in the context of epithelial cell infection by IAV.
The proposed algorithm is the first scalable method for large complex network structure identification. The GRN structure identified by our algorithm could reveal possible biological links and help researchers to choose which gene functions to investigate in a biological event. The algorithm described in this article is implemented in MATLAB , and the source code is freely available from https://github.com/Hongyu-Miao/DMI.git .
基因调控相互作用对于各种生物学功能和过程至关重要。然而,之前只有少数计算研究声称成功地从时间基因表达数据中揭示了全基因组调控图谱,尤其是对于像人类这样的复杂真核生物。此外,最近的研究表明,如果网络规模增加到100或更高,这些方法仍然会受到维度诅咒的影响。
在此,我们提出了一种用于识别全基因组基因调控网络(GRN)结构的新型可扩展算法,并通过基于DREAM挑战基准数据的广泛模拟研究验证了该算法的性能。我们方法的亮点在于,即使对于规模达10的网络,其卓越性能也不会退化,因此很容易应用于大规模复杂网络。通过在正则化公式中同时考虑先验生物学知识和复杂网络的多种拓扑特性(即稀疏性和枢纽基因结构),实现了这一突破。我们还使用来自一项关于人类呼吸道上皮细胞对甲型流感病毒(IAV)感染反应的研究中的时间进程基因表达数据,以及来自ENCODE的关于转录因子(TF)和靶基因相互作用的CHIP-seq数据,在实践中验证并说明了我们算法的应用。由于所提出的算法,一个有趣的发现是,在IAV感染上皮细胞的背景下,GRN中最大的枢纽结构(例如前十个)都集中在一些转录因子周围。
所提出的算法是第一种用于大型复杂网络结构识别的可扩展方法。我们算法识别出的GRN结构可以揭示可能的生物学联系,并帮助研究人员选择在生物事件中研究哪些基因功能。本文所述算法是用MATLAB实现的,源代码可从https://github.com/Hongyu-Miao/DMI.git免费获取。