Department of Computer Sciences, University of Wisconsin-Madison, 1210 W. Dayton St. Madison, WI 53706-1613, USA.
Wisconsin Institute for Discovery, University of Wisconsin-Madison, Discovery Building 330 North Orchard St. Madison, WI 53715, USA.
Nucleic Acids Res. 2017 Feb 28;45(4):e21. doi: 10.1093/nar/gkw963.
Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the-art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization.
转录调控网络指定了控制基因在特定环境下表达水平的调控蛋白。推断全基因组调控网络是理解基因调控的核心,但仍然是一个开放的挑战。基于表达谱的网络推断是推断调控网络最常用的方法之一,然而,从这些方法推断出的网络与实验得出的(例如 ChIP-chip 和转录因子(TF)敲除)网络重叠度较低。目前我们对这种差异的理解有限。为了解决这个差距,我们首先开发了一种基于概率图模型的调控网络推断算法,将表达谱与支持调控边缘的辅助数据集进行整合。其次,我们全面分析了我们和其他最先进的方法在不同的表达扰动数据集上的表现。通过将序列特异性基序与表达谱进行整合推断出的网络与实验得出的网络具有更高的一致性,同时比基于基序的网络更具有表达预测能力。我们的分析表明,自然遗传变异是最具信息量的网络推断扰动,并且确定了核心 TF,其靶基因可以从表达谱中预测。其他 TF 的靶基因难以识别的原因有多种,包括网络结构和 TF mRNA 水平的变化不足。最后,我们展示了我们的推断算法在推断应激特异性调控网络和调控因子优先级方面的应用。