Wu Wei-Sheng, Lai Fu-Jou
BMC Genomics. 2015;16 Suppl 12(Suppl 12):S10. doi: 10.1186/1471-2164-16-S12-S10. Epub 2015 Dec 9.
Transcriptional regulation of gene expression in eukaryotes is usually accomplished by cooperative transcription factors (TFs). Computational identification of cooperative TF pairs has become a hot research topic and many algorithms have been proposed in the literature. A typical algorithm for predicting cooperative TF pairs has two steps. (Step 1) Define the targets of each TF under study. (Step 2) Design a measure for calculating the cooperativity of a TF pair based on the targets of these two TFs. While different algorithms have distinct sophisticated cooperativity measures, the targets of a TF are usually defined using ChIP-chip data. However, there is an inherent weakness in using ChIP-chip data to define the targets of a TF. ChIP-chip analysis can only identify the binding targets of a TF but it cannot distinguish the true regulatory from the binding but non-regulatory targets of a TF.
This work is the first study which aims to investigate whether the performance of computational identification of cooperative TF pairs could be improved by using a more biologically relevant way to define the targets of a TF. For this purpose, we propose four simple algorithms, all of which consist of two steps. (Step 1) Define the targets of a TF using (i) ChIP-chip data in the first algorithm, (ii) TF binding data in the second algorithm, (iii) TF perturbation data in the third algorithm, and (iv) the intersection of TF binding and TF perturbation data in the fourth algorithm. Compared with the first three algorithms, the fourth algorithm uses a more biologically relevant way to define the targets of a TF. (Step 2) Measure the cooperativity of a TF pair by the statistical significance of the overlap of the targets of these two TFs using the hypergeometric test. By adopting four existing performance indices, we show that the fourth proposed algorithm (PA4) significantly out performs the other three proposed algorithms. This suggests that the computational identification of cooperative TF pairs is indeed improved when using a more biologically relevant way to define the targets of a TF. Strikingly, the prediction results of our simple PA4 are more biologically meaningful than those of the 12 existing sophisticated algorithms in the literature, all of which used ChIP-chip data to define the targets of a TF. This suggests that properly defining the targets of a TF may be more important than designing sophisticated cooperativity measures. In addition, our PA4 has the power to predict several experimentally validated cooperative TF pairs, which have not been successfully predicted by any existing algorithms in the literature.
This study shows that the performance of computational identification of cooperative TF pairs could be improved by using a more biologically relevant way to define the targets of a TF. The main contribution of this study is not to propose another new algorithm but to provide a new thinking for the research of computational identification of cooperative TF pairs. Researchers should put more effort on properly defining the targets of a TF (i.e. Step 1) rather than totally focus on designing sophisticated cooperativity measures (i.e. Step 2). The lists of TF target genes, the Matlab codes and the prediction results of the four proposed algorithms could be downloaded from our companion website http://cosbi3.ee.ncku.edu.tw/TFI/.
真核生物中基因表达的转录调控通常由协同转录因子(TFs)完成。协同转录因子对的计算识别已成为一个热门研究课题,文献中已提出许多算法。预测协同转录因子对的典型算法有两个步骤。(步骤1)定义每个研究中的转录因子的靶标。(步骤2)基于这两个转录因子的靶标设计一种计算转录因子对协同性的度量方法。虽然不同算法有不同的复杂协同性度量方法,但转录因子的靶标通常使用芯片杂交(ChIP-chip)数据来定义。然而,使用芯片杂交数据定义转录因子的靶标存在一个固有缺陷。芯片杂交分析只能识别转录因子的结合靶标,但无法区分转录因子真正的调控靶标和结合但非调控的靶标。
本研究首次旨在探讨是否可以通过使用更具生物学相关性的方式定义转录因子的靶标来提高协同转录因子对计算识别的性能。为此,我们提出了四种简单算法,所有算法均由两个步骤组成。(步骤1)在第一种算法中使用(i)芯片杂交数据定义转录因子的靶标,在第二种算法中使用(ii)转录因子结合数据,在第三种算法中使用(iii)转录因子干扰数据,在第四种算法中使用(iv)转录因子结合数据与转录因子干扰数据的交集。与前三种算法相比,第四种算法使用了更具生物学相关性的方式来定义转录因子的靶标。(步骤2)使用超几何检验通过这两个转录因子靶标重叠的统计显著性来度量转录因子对的协同性。通过采用四个现有的性能指标,我们表明所提出的第四种算法(PA4)明显优于其他三种算法。这表明当使用更具生物学相关性的方式定义转录因子的靶标时,协同转录因子对的计算识别确实得到了改善。引人注目的是,我们简单的PA4的预测结果比文献中12种现有的复杂算法更具生物学意义,所有这些算法都使用芯片杂交数据来定义转录因子的靶标。这表明正确定义转录因子的靶标可能比设计复杂的协同性度量方法更重要。此外,我们的PA4能够预测几个经实验验证的协同转录因子对,而文献中任何现有算法都未成功预测到这些对。
本研究表明,通过使用更具生物学相关性的方式定义转录因子的靶标,可以提高协同转录因子对计算识别的性能。本研究的主要贡献不是提出另一种新算法,而是为协同转录因子对计算识别的研究提供一种新思维。研究人员应在正确定义转录因子的靶标(即步骤1)上投入更多精力,而不是完全专注于设计复杂的协同性度量方法(即步骤2)。四种提出算法的转录因子靶标基因列表、Matlab代码和预测结果可从我们的配套网站http://cosbi3.ee.ncku.edu.tw/TFI/下载。