Dozmorov Mikhail G, Mu Wancen, Davis Eric S, Lee Stuart, Triche Timothy J, Phanstiel Douglas H, Love Michael I
Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298, USA.
Department of Pathology, Virginia Commonwealth University, Richmond, VA 23284, USA.
Bioinform Adv. 2022 Dec 16;2(1):vbac097. doi: 10.1093/bioadv/vbac097. eCollection 2022.
CTCF (CCCTC-binding factor) is an 11-zinc-finger DNA binding protein which regulates much of the eukaryotic genome's 3D structure and function. The diversity of CTCF binding motifs has led to a fragmented landscape of CTCF binding data. We collected position weight matrices of CTCF binding motifs and defined strand-oriented CTCF binding sites in the human and mouse genomes, including the recent Telomere to Telomere and mm39 assemblies. We included selected experimentally determined and predicted CTCF binding sites, such as CTCF-bound cis-regulatory elements from SCREEN ENCODE. We recommend filtering strategies for CTCF binding motifs and demonstrate that liftOver is a viable alternative to convert CTCF coordinates between assemblies. Our comprehensive data resource and usage recommendations can serve to harmonize and strengthen the reproducibility of genomic studies utilizing CTCF binding data.
https://bioconductor.org/packages/CTCF. Companion website: https://dozmorovlab.github.io/CTCF/; Code to reproduce the analyses: https://github.com/dozmorovlab/CTCF.dev.
Supplementary data are available at online.
CTCF(CCCTC结合因子)是一种具有11个锌指结构的DNA结合蛋白,它调控着真核基因组的大部分三维结构和功能。CTCF结合基序的多样性导致了CTCF结合数据的碎片化局面。我们收集了CTCF结合基序的位置权重矩阵,并在人类和小鼠基因组中定义了链向CTCF结合位点,包括最近的端粒到端粒和mm39组装。我们纳入了选定的实验确定和预测的CTCF结合位点,如来自SCREEN ENCODE的CTCF结合顺式调控元件。我们推荐了CTCF结合基序的过滤策略,并证明liftOver是在不同组装之间转换CTCF坐标的可行替代方法。我们全面的数据资源和使用建议有助于协调和加强利用CTCF结合数据的基因组研究的可重复性。
补充数据可在网上获取。