Kang Yiming, Liow Hien-Haw, Maier Ezekiel J, Brent Michael R
Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA.
Department of Mathematics, Washington University, Saint Louis, MO, USA.
Bioinformatics. 2018 Jan 15;34(2):249-257. doi: 10.1093/bioinformatics/btx563.
Cells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and 'integrative' algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types.
We present NetProphet 2.0, a 'data light' algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map.
Source code and comprehensive documentation are freely available at https://github.com/yiming-kang/NetProphet_2.0.
Supplementary data are available at Bioinformatics online.
细胞部分通过转录因子(TF)网络处理信息,该网络控制各个基因产生其产物的速率。TF网络图谱是一种图表,它表明哪些TF结合并直接调控每个基因。先前的工作描述了仅依赖基因表达数据的网络映射算法以及利用广泛数据源的“整合”算法,这些数据源包括许多TF的染色质免疫沉淀测序(ChIP-seq)、全基因组染色质标记以及体外测定的许多TF的结合特异性。然而,此类资源仅适用于少数几个主要的模型系统,并且难以轻易地应用于新的生物体或细胞类型。
我们提出了NetProphet 2.0,一种用于TF网络映射的“轻数据”算法,并表明它在识别TF的直接靶标方面比其他类似的轻数据算法更准确。特别是,它通过利用三个原则提高了仅使用基因表达数据的NetProphet 1.0的准确性。第一,相对于组成方法,结合多种从表达数据进行网络映射的方法可以提高准确性。第二,具有相似DNA结合结构域的TF结合相似的靶基因集。第三,即使是一个嘈杂的、初步的网络图谱也可用于从启动子序列推断DNA结合特异性,并且这些推断的特异性可用于进一步提高网络图谱的准确性。
源代码和全面的文档可在https://github.com/yiming-kang/NetProphet_2.0上免费获取。
补充数据可在《生物信息学》在线获取。