Reddy Timothy E, DeLisi Charles, Shakhnovich Boris E
Program in Bioinformatics and Systems Biology, Boston University, Boston, MA 02215, USA.
Genome Inform. 2005;16(1):59-67.
Genome scale identification of transcription factor binding sites (TFBS) is fundamental to understanding the complexities of mRNA expression at both the cell and organismal levels. While high-throughput experimental methods provide associations between transcription factors and the genes they regulate under a specified experimental condition, computational methods are still required to pinpoint the exact location of binding. Moreover, since the binding site is an intrinsic property of the promoter region, computational methods are in principle more general than condition dependent experimental methods. Computational identification of TFBSs is complicated in at least two different ways. First, transcription factors bind a heterogeneous distribution of sites and therefore have a distribution of affinities. Second, the set of sequences for which a common site is to be determined do not all have a site for the TF of interest. In this paper, we evaluate the robustness of TFBS identification with respect to both effects. We show addition of upstream regions that do not have the TFBS destroy the specificity of the predicted binding site. We also propose a method to calculate the distance between position weight matrices that can be used to measure "drift'' from the canonical binding site. The results presented here could be useful in developing future transcription factor binding site identification algorithms.
转录因子结合位点(TFBS)的全基因组规模鉴定对于理解细胞和生物体水平上mRNA表达的复杂性至关重要。虽然高通量实验方法能在特定实验条件下提供转录因子与其调控基因之间的关联,但仍需要计算方法来精确确定结合位点。此外,由于结合位点是启动子区域的固有属性,计算方法原则上比依赖条件的实验方法更具通用性。TFBS的计算鉴定至少在两种不同方面较为复杂。首先,转录因子结合位点分布不均,因此具有不同的亲和力分布。其次,要确定共同位点的序列集并非都含有感兴趣转录因子的结合位点。在本文中,我们评估了TFBS鉴定在这两种影响方面的稳健性。我们表明添加不具有TFBS的上游区域会破坏预测结合位点的特异性。我们还提出了一种计算位置权重矩阵之间距离的方法,该方法可用于测量与典型结合位点的“偏差”。本文给出的结果可能有助于开发未来的转录因子结合位点鉴定算法。