Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising D-85354, Germany.
Institute for Advanced Study, Technical University of Munich, Garching D-85748, Germany.
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad026. Epub 2023 May 3.
Eukaryotic gene expression is controlled by cis-regulatory elements (CREs), including promoters and enhancers, which are bound by transcription factors (TFs). Differential expression of TFs and their binding affinity at putative CREs determine tissue- and developmental-specific transcriptional activity. Consolidating genomic datasets can offer further insights into the accessibility of CREs, TF activity, and, thus, gene regulation. However, the integration and analysis of multimodal datasets are hampered by considerable technical challenges. While methods for highlighting differential TF activity from combined chromatin state data (e.g., chromatin immunoprecipitation [ChIP], ATAC, or DNase sequencing) and RNA sequencing data exist, they do not offer convenient usability, have limited support for large-scale data processing, and provide only minimal functionality for visually interpreting results.
We developed TF-Prioritizer, an automated pipeline that prioritizes condition-specific TFs from multimodal data and generates an interactive web report. We demonstrated its potential by identifying known TFs along with their target genes, as well as previously unreported TFs active in lactating mouse mammary glands. Additionally, we studied a variety of ENCODE datasets for cell lines K562 and MCF-7, including 12 histone modification ChIP sequencing as well as ATAC and DNase sequencing datasets, where we observe and discuss assay-specific differences.
TF-Prioritizer accepts ATAC, DNase, or ChIP sequencing and RNA sequencing data as input and identifies TFs with differential activity, thus offering an understanding of genome-wide gene regulation, potential pathogenesis, and therapeutic targets in biomedical research.
真核基因表达受顺式调控元件(CREs)调控,包括启动子和增强子,它们与转录因子(TFs)结合。TFs 的差异表达及其在假定 CREs 上的结合亲和力决定了组织和发育特异性的转录活性。整合基因组数据集可以进一步深入了解 CREs 的可及性、TF 活性,从而了解基因调控。然而,多模态数据集的整合和分析受到相当大的技术挑战的阻碍。虽然有方法可以从组合染色质状态数据(例如染色质免疫沉淀 [ChIP]、ATAC 或 DNase 测序)和 RNA 测序数据中突出显示差异 TF 活性,但它们不方便使用,对大规模数据处理的支持有限,并且仅提供用于直观解释结果的最小功能。
我们开发了 TF-Prioritizer,这是一个自动化管道,可从多模态数据中优先考虑条件特异性 TF,并生成交互式网络报告。我们通过识别已知 TF 及其靶基因,以及在泌乳期小鼠乳腺中活跃的以前未报告的 TF,证明了其潜力。此外,我们研究了各种 ENCODE 数据集,包括 K562 和 MCF-7 细胞系的 12 种组蛋白修饰 ChIP 测序以及 ATAC 和 DNase 测序数据集,我们观察并讨论了特定于检测的差异。
TF-Prioritizer 接受 ATAC、DNase 或 ChIP 测序和 RNA 测序数据作为输入,并识别具有差异活性的 TFs,从而提供对全基因组基因调控、潜在发病机制和生物医学研究中的治疗靶点的理解。