Dept. of Neuroscience, 5507 WIMR, University of Wisconsin-Madison, Madison, United States of America.
PLoS Comput Biol. 2020 Apr 6;16(4):e1007800. doi: 10.1371/journal.pcbi.1007800. eCollection 2020 Apr.
Transcriptomic profiling is an immensely powerful hypothesis generating tool. However, accurately predicting the transcription factors (TFs) and cofactors that drive transcriptomic differences between samples is challenging. A number of algorithms draw on ChIP-seq tracks to define TFs and cofactors behind gene changes. These approaches assign TFs and cofactors to genes via a binary designation of 'target', or 'non-target' followed by Fisher Exact Tests to assess enrichment of TFs and cofactors. ENCODE archives 2314 ChIP-seq tracks of 684 TFs and cofactors assayed across a 117 human cell lines under a multitude of growth and maintenance conditions. The algorithm presented herein, Mining Algorithm for GenetIc Controllers (MAGIC), uses ENCODE ChIP-seq data to look for statistical enrichment of TFs and cofactors in gene bodies and flanking regions in gene lists without an a priori binary classification of genes as targets or non-targets. When compared to other TF mining resources, MAGIC displayed favourable performance in predicting TFs and cofactors that drive gene changes in 4 settings: 1) A cell line expressing or lacking single TF, 2) Breast tumors divided along PAM50 designations 3) Whole brain samples from WT mice or mice lacking a single TF in a particular neuronal subtype 4) Single cell RNAseq analysis of neurons divided by Immediate Early Gene expression levels. In summary, MAGIC is a standalone application that produces meaningful predictions of TFs and cofactors in transcriptomic experiments.
转录组谱分析是一种非常强大的假设生成工具。然而,准确预测驱动样本间转录组差异的转录因子(TFs)和辅助因子是具有挑战性的。许多算法利用 ChIP-seq 轨迹来定义 TF 和辅助因子,以驱动基因变化。这些方法通过“靶标”或“非靶标”的二进制指定将 TF 和辅助因子分配给基因,然后进行 Fisher 精确检验以评估 TF 和辅助因子的富集。ENCODE 档案包含 684 个 TF 和辅助因子的 2314 个 ChIP-seq 轨迹,这些 TF 和辅助因子在 117 个人类细胞系中在多种生长和维持条件下进行了检测。本文提出的算法,即遗传控制器挖掘算法(MAGIC),使用 ENCODE ChIP-seq 数据在没有基因作为靶标或非靶标预先二进制分类的情况下,在基因列表中寻找基因体和侧翼区域中 TF 和辅助因子的统计富集。与其他 TF 挖掘资源相比,MAGIC 在预测驱动基因变化的 TF 和辅助因子方面在 4 种情况下表现出良好的性能:1)表达或缺乏单个 TF 的细胞系,2)根据 PAM50 标记划分的乳腺肿瘤,3)WT 小鼠或缺乏特定神经元亚型中单个 TF 的小鼠的全脑样本,4)根据即时早期基因表达水平划分的神经元的单细胞 RNAseq 分析。总之,MAGIC 是一个独立的应用程序,可在转录组实验中对 TF 和辅助因子进行有意义的预测。