Cheng Chao, Yan Xiting, Sun Fengzhu, Li Lei M
Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089-2910, USA.
BMC Bioinformatics. 2007 Nov 16;8:452. doi: 10.1186/1471-2105-8-452.
The identification of transcription factors (TFs) associated with a biological process is fundamental to understanding its regulatory mechanisms. From microarray data, however, the activity changes of TFs often cannot be directly observed due to their relatively low expression levels, post-transcriptional modifications, and other complications. Several approaches have been proposed to infer TF activity changes from microarray data. In some models, a linear relationship between gene expression and TF-gene binding strength is assumed. In some other models, the target genes of a TF are first determined by a significance cutoff to binding affinity scores, and then expression differentiation is checked between the target and other genes.
We propose a novel method, referred to as BASE (binding association with sorted expression), to infer TF activity changes from microarray expression profiles with the help of binding affinity data. It searches the maximum association between bind affinity profile of a TF and expression change profile along the direction of sorted differentiation. The method does not make hard target gene selection, rather, the significances of TF activity changes are evaluated by permutation tests of binding association at the end. To show the effectiveness of this method, we apply it to three typical examples using different kinds of binding affinity data, namely, ChIP-chip data, motif discovery data, and positional weighted matrix scanning data, respectively. The implications obtained from all three examples are consistent with established biological results. Moreover, the inferences suggest new and biological meaningful hypotheses for further investigation.
The proposed method makes transcription inference from profiles of expression and binding affinity. The same machinery can be used to deal with various kinds of binding affinity data. The method does not require a linear assumption, and has the desirable property of scale-invariance with respect to TF-specific binding affinity. This method is easy to implement and can be routinely applied for transcriptional inferences in microarray studies.
识别与生物过程相关的转录因子(TFs)是理解其调控机制的基础。然而,从微阵列数据中,由于TFs相对较低的表达水平、转录后修饰及其他复杂因素,其活性变化往往无法直接观察到。已经提出了几种方法来从微阵列数据中推断TF活性变化。在一些模型中,假设基因表达与TF-基因结合强度之间存在线性关系。在其他一些模型中,首先通过结合亲和力得分的显著性阈值确定TF的靶基因,然后检查靶基因与其他基因之间的表达差异。
我们提出了一种新方法,称为BASE(基于排序表达的结合关联),借助结合亲和力数据从微阵列表达谱推断TF活性变化。它沿着排序分化的方向搜索TF的结合亲和力谱与表达变化谱之间的最大关联。该方法不进行硬性的靶基因选择,而是在最后通过结合关联的置换检验来评估TF活性变化的显著性。为了证明该方法的有效性,我们分别将其应用于三个典型示例,使用了不同类型的结合亲和力数据,即ChIP-chip数据、基序发现数据和位置加权矩阵扫描数据。从所有三个示例中获得的结论与已确立的生物学结果一致。此外,这些推断还提出了新的、具有生物学意义的假设以供进一步研究。
所提出的方法从表达谱和结合亲和力谱进行转录推断。相同的机制可用于处理各种结合亲和力数据。该方法不需要线性假设,并且对于TF特异性结合亲和力具有理想的尺度不变性。该方法易于实现,可常规应用于微阵列研究中的转录推断。