UniBind:九个物种中高可信度直接 TF-DNA 相互作用的图谱。
UniBind: maps of high-confidence direct TF-DNA interactions across nine species.
机构信息
Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0349, Oslo, Norway.
Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, 94305, USA.
出版信息
BMC Genomics. 2021 Jun 26;22(1):482. doi: 10.1186/s12864-021-07760-6.
BACKGROUND
Transcription factors (TFs) bind specifically to TF binding sites (TFBSs) at cis-regulatory regions to control transcription. It is critical to locate these TF-DNA interactions to understand transcriptional regulation. Efforts to predict bona fide TFBSs benefit from the availability of experimental data mapping DNA binding regions of TFs (chromatin immunoprecipitation followed by sequencing - ChIP-seq).
RESULTS
In this study, we processed ~ 10,000 public ChIP-seq datasets from nine species to provide high-quality TFBS predictions. After quality control, it culminated with the prediction of ~ 56 million TFBSs with experimental and computational support for direct TF-DNA interactions for 644 TFs in > 1000 cell lines and tissues. These TFBSs were used to predict > 197,000 cis-regulatory modules representing clusters of binding events in the corresponding genomes. The high-quality of the TFBSs was reinforced by their evolutionary conservation, enrichment at active cis-regulatory regions, and capacity to predict combinatorial binding of TFs. Further, we confirmed that the cell type and tissue specificity of enhancer activity was correlated with the number of TFs with binding sites predicted in these regions. All the data is provided to the community through the UniBind database that can be accessed through its web-interface ( https://unibind.uio.no/ ), a dedicated RESTful API, and as genomic tracks. Finally, we provide an enrichment tool, available as a web-service and an R package, for users to find TFs with enriched TFBSs in a set of provided genomic regions.
CONCLUSIONS
UniBind is the first resource of its kind, providing the largest collection of high-confidence direct TF-DNA interactions in nine species.
背景
转录因子(TFs)特异性地结合顺式调控区的 TF 结合位点(TFBS),以控制转录。定位这些 TF-DNA 相互作用对于理解转录调控至关重要。预测真正的 TFBS 得益于可获得的实验数据,这些数据可绘制 TFs 的 DNA 结合区域(染色质免疫沉淀 followed by sequencing - ChIP-seq)。
结果
在这项研究中,我们处理了来自 9 个物种的约 10000 个公共 ChIP-seq 数据集,以提供高质量的 TFBS 预测。经过质量控制,最终预测了约 5600 万个 TFBS,这些 TFBS 具有实验和计算支持,可直接与 644 个 TF 在 >1000 种细胞系和组织中的 DNA 相互作用。这些 TFBS 用于预测 >197000 个顺式调控模块,代表相应基因组中结合事件的聚类。TFBS 的高质量通过其进化保守性、在活跃的顺式调控区域中的富集以及预测 TF 组合结合的能力得到了加强。此外,我们还证实了增强子活性的细胞类型和组织特异性与在这些区域中预测的具有结合位点的 TF 数量相关。所有数据都通过 UniBind 数据库提供给社区,该数据库可通过其网络界面(https://unibind.uio.no/)、专用的 RESTful API 以及基因组轨道访问。最后,我们提供了一个富集工具,可作为网络服务和 R 包使用,供用户在一组提供的基因组区域中查找具有富集 TFBS 的 TF。
结论
UniBind 是同类资源中的第一个,提供了 9 个物种中最大的一组高可信度的直接 TF-DNA 相互作用。