Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX 78249, USA.
Department of Biological Sciences, Northern Illinois University, DeKalb, IL 60115, USA.
Genes (Basel). 2023 Jan 21;14(2):282. doi: 10.3390/genes14020282.
Transcription factors are an integral component of the cellular machinery responsible for regulating many biological processes, and they recognize distinct DNA sequence patterns as well as internal/external signals to mediate target gene expression. The functional roles of an individual transcription factor can be traced back to the functions of its target genes. While such functional associations can be inferred through the use of binding evidence from high-throughput sequencing technologies available today, including chromatin immunoprecipitation sequencing, such experiments can be resource-consuming. On the other hand, exploratory analysis driven by computational techniques can alleviate this burden by narrowing the search scope, but the results are often deemed low-quality or non-specific by biologists. In this paper, we introduce a data-driven, statistics-based strategy to predict novel functional associations for transcription factors in the model plant . To achieve this, we leverage one of the largest available gene expression compendia to build a genome-wide transcriptional regulatory network and infer regulatory relationships among transcription factors and their targets. We then use this network to build a pool of likely downstream targets for each transcription factor and query each target pool for functionally enriched gene ontology terms. The results exhibited sufficient statistical significance to annotate most of the transcription factors in Arabidopsis with highly specific biological processes. We also perform DNA binding motif discovery for transcription factors based on their target pool. We show that the predicted functions and motifs strongly agree with curated databases constructed from experimental evidence. In addition, statistical analysis of the network revealed interesting patterns and connections between network topology and system-level transcriptional regulation properties. We believe that the methods demonstrated in this work can be extended to other species to improve the annotation of transcription factors and understand transcriptional regulation on a system level.
转录因子是细胞机制的一个组成部分,负责调节许多生物过程,它们识别独特的 DNA 序列模式以及内部/外部信号,以介导靶基因表达。单个转录因子的功能作用可以追溯到其靶基因的功能。虽然这种功能关联可以通过使用当今可用的高通量测序技术(包括染色质免疫沉淀测序)的结合证据来推断,但这些实验可能会消耗大量资源。另一方面,计算技术驱动的探索性分析可以通过缩小搜索范围来减轻这种负担,但生物学家通常认为结果质量低或不具体。在本文中,我们介绍了一种基于数据的、基于统计学的策略,用于预测模型植物中转录因子的新功能关联。为此,我们利用可用的最大基因表达汇编之一来构建全基因组转录调控网络,并推断转录因子及其靶基因之间的调控关系。然后,我们使用这个网络为每个转录因子构建一个可能的下游靶基因池,并查询每个靶基因池的功能丰富的基因本体术语。结果表现出足够的统计学意义,足以对拟南芥中的大多数转录因子进行高度特异性的生物学过程注释。我们还基于其靶基因池对转录因子进行 DNA 结合基序发现。我们表明,预测的功能和基序与基于实验证据构建的精心制作的数据库非常吻合。此外,网络的统计分析揭示了网络拓扑和系统级转录调控特性之间有趣的模式和联系。我们相信,这项工作中展示的方法可以扩展到其他物种,以提高转录因子的注释,并在系统水平上理解转录调控。