James Gareth M, Sabatti Chiara, Zhou Nengfeng, Zhu Ji
University of Southern California, Stanford University, University of Michigan and University of Michigan.
Ann Appl Stat. 2010 Jun;4(2):663-686. doi: 10.1214/10-aoas350.
In many organisms the expression levels of each gene are controlled by the activation levels of known "Transcription Factors" (TF). A problem of considerable interest is that of estimating the "Transcription Regulation Networks" (TRN) relating the TFs and genes. While the expression levels of genes can be observed, the activation levels of the corresponding TFs are usually unknown, greatly increasing the difficulty of the problem. Based on previous experimental work, it is often the case that partial information about the TRN is available. For example, certain TFs may be known to regulate a given gene or in other cases a connection may be predicted with a certain probability. In general, the biology of the problem indicates there will be very few connections between TFs and genes. Several methods have been proposed for estimating TRNs. However, they all suffer from problems such as unrealistic assumptions about prior knowledge of the network structure or computational limitations. We propose a new approach that can directly utilize prior information about the network structure in conjunction with observed gene expression data to estimate the TRN. Our approach uses L(1) penalties on the network to ensure a sparse structure. This has the advantage of being computationally efficient as well as making many fewer assumptions about the network structure. We use our methodology to construct the TRN for E. coli and show that the estimate is biologically sensible and compares favorably with previous estimates.
在许多生物体中,每个基因的表达水平由已知的“转录因子”(TF)的激活水平控制。一个备受关注的问题是估计与转录因子和基因相关的“转录调控网络”(TRN)。虽然基因的表达水平可以观察到,但相应转录因子的激活水平通常是未知的,这大大增加了问题的难度。基于先前的实验工作,通常可以获得关于转录调控网络的部分信息。例如,某些转录因子可能已知调控某个特定基因,或者在其他情况下,可以以一定概率预测一种联系。一般来说,该问题的生物学特性表明转录因子和基因之间的联系非常少。已经提出了几种估计转录调控网络的方法。然而,它们都存在诸如对网络结构的先验知识假设不现实或计算限制等问题。我们提出了一种新方法,该方法可以直接利用关于网络结构的先验信息并结合观察到的基因表达数据来估计转录调控网络。我们的方法对网络使用L(1)惩罚以确保稀疏结构。这具有计算效率高的优点,并且对网络结构的假设要少得多。我们使用我们的方法构建大肠杆菌的转录调控网络,并表明该估计在生物学上是合理的,并且与先前的估计相比具有优势。