Sikdar Sinjini, Datta Susmita
Department of Biostatistics, University of Florida, Gainesville, FL, 32611, USA.
BMC Bioinformatics. 2017 Feb 2;18(1):79. doi: 10.1186/s12859-017-1499-x.
Transcription factors are known to play key roles in carcinogenesis and therefore, are gaining popularity as potential therapeutic targets in drug development. A 'master regulator' transcription factor often appears to control most of the regulatory activities of the other transcription factors and the associated genes. This 'master regulator' transcription factor is at the top of the hierarchy of the transcriptomic regulation. Therefore, it is important to identify and target the master regulator transcription factor for proper understanding of the associated disease process and identifying the best therapeutic option.
We present a novel two-step computational approach for identification of master regulator transcription factor in a genome. At the first step of our method we test whether there exists any master regulator transcription factor in the system. We evaluate the concordance of two ranked lists of transcription factors using a statistical measure. In case the concordance measure is statistically significant, we conclude that there is a master regulator. At the second step, our method identifies the master regulator transcription factor, if there exists one.
In the simulation scenario, our method performs reasonably well in validating the existence of a master regulator when the number of subjects in each treatment group is reasonably large. In application to two real datasets, our method ensures the existence of master regulators and identifies biologically meaningful master regulators. An R code for implementing our method in a sample test data can be found in http://www.somnathdatta.org/software .
We have developed a screening method of identifying the 'master regulator' transcription factor just using only the gene expression data. Understanding the regulatory structure and finding the master regulator help narrowing the search space for identifying biomarkers for complex diseases such as cancer. In addition to identifying the master regulator our method provides an overview of the regulatory structure of the transcription factors which control the global gene expression profiles and consequently the cell functioning.
已知转录因子在癌症发生过程中发挥关键作用,因此,作为药物开发中潜在的治疗靶点正越来越受到关注。一个“主调控因子”转录因子似乎常常控制着其他转录因子及相关基因的大部分调控活动。这个“主调控因子”转录因子处于转录组调控层级的顶端。因此,识别并靶向主调控因子转录因子对于正确理解相关疾病过程及确定最佳治疗方案很重要。
我们提出了一种用于在基因组中识别主调控因子转录因子的新颖的两步计算方法。在我们方法的第一步,我们测试系统中是否存在任何主调控因子转录因子。我们使用一种统计量来评估两个转录因子排名列表的一致性。如果一致性度量在统计上显著,我们就得出存在一个主调控因子的结论。在第二步,如果存在主调控因子,我们的方法会识别出该主调控因子转录因子。
在模拟场景中,当每个治疗组中的受试者数量足够大时,我们的方法在验证主调控因子的存在方面表现良好。在应用于两个真实数据集时,我们的方法确保了主调控因子的存在并识别出具有生物学意义的主调控因子。在http://www.somnathdatta.org/software 上可以找到在示例测试数据中实现我们方法的R代码。
我们开发了一种仅使用基因表达数据来识别“主调控因子”转录因子的筛选方法。理解调控结构并找到主调控因子有助于缩小寻找癌症等复杂疾病生物标志物的搜索空间。除了识别主调控因子外,我们的方法还提供了控制全局基因表达谱进而控制细胞功能的转录因子调控结构的概述。