Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA.
Department of Cardiovascular Medicine, First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, 710061, P. R. China.
Bioinformatics. 2020 Apr 15;36(8):2352-2358. doi: 10.1093/bioinformatics/btz975.
The availability of thousands of genome-wide coupling chromatin immunoprecipitation (ChIP)-Seq datasets across hundreds of transcription factors (TFs) and cell lines provides an unprecedented opportunity to jointly analyze large-scale TF-binding in vivo, making possible the discovery of the potential interaction and cooperation among different TFs. The interacted and cooperated TFs can potentially form a transcriptional regulatory module (TRM) (e.g. co-binding TFs), which helps decipher the combinatorial regulatory mechanisms.
We develop a computational method tfLDA to apply state-of-the-art topic models to multiple ChIP-Seq datasets to decipher the combinatorial binding events of multiple TFs. tfLDA is able to learn high-order combinatorial binding patterns of TFs from multiple ChIP-Seq profiles, interpret and visualize the combinatorial patterns. We apply the tfLDA to two cell lines with a rich collection of TFs and identify combinatorial binding patterns that show well-known TRMs and related TF co-binding events.
A software R package tfLDA is freely available at https://github.com/lichen-lab/tfLDA.
Supplementary data are available at Bioinformatics online.
数以千计的全基因组偶联染色质免疫沉淀(ChIP)-Seq 数据集,涵盖数百个转录因子(TF)和细胞系,为联合分析大规模体内 TF 结合提供了前所未有的机会,从而有可能发现不同 TF 之间的潜在相互作用和合作。相互作用和合作的 TF 可能潜在地形成转录调控模块(TRM)(例如共同结合的 TF),这有助于破译组合调控机制。
我们开发了一种计算方法 tfLDA,将最先进的主题模型应用于多个 ChIP-Seq 数据集,以破译多个 TF 的组合结合事件。tfLDA 能够从多个 ChIP-Seq 谱中学习 TF 的高阶组合结合模式,解释和可视化组合模式。我们将 tfLDA 应用于两个具有丰富 TF 集合的细胞系,并确定了显示已知 TRM 和相关 TF 共结合事件的组合结合模式。
软件 R 包 tfLDA 可在 https://github.com/lichen-lab/tfLDA 上免费获得。
补充数据可在生物信息学在线获得。