• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从 ChIP-seq 数据中发现未知的人和小鼠转录因子结合位点及其特征。

Discovering unknown human and mouse transcription factor binding sites and their characteristics from ChIP-seq data.

机构信息

Biodiversity Research Center, Academia Sinica, 115 Taipei, Taiwan.

Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024.

出版信息

Proc Natl Acad Sci U S A. 2021 May 18;118(20). doi: 10.1073/pnas.2026754118.

DOI:10.1073/pnas.2026754118
PMID:33975951
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8158016/
Abstract

Transcription factor binding sites (TFBSs) are essential for gene regulation, but the number of known TFBSs remains limited. We aimed to discover and characterize unknown TFBSs by developing a computational pipeline for analyzing ChIP-seq (chromatin immunoprecipitation followed by sequencing) data. Applying it to the latest ENCODE ChIP-seq data for human and mouse, we found that using the irreproducible discovery rate as a quality-control criterion resulted in many experiments being unnecessarily discarded. By contrast, the number of motif occurrences in ChIP-seq peak regions provides a highly effective criterion, which is reliable even if supported by only one experimental replicate. In total, we obtained 2,058 motifs from 1,089 experiments for 354 human TFs and 163 motifs from 101 experiments for 34 mouse TFs. Among these motifs, 487 have not previously been reported. Mapping the canonical motifs to the human genome reveals a high TFBS density ±2 kb around transcription start sites (TSSs) with a peak at -50 bp. On average, a promoter contains 5.7 TFBSs. However, 70% of TFBSs are in introns (41%) and intergenic regions (29%), whereas only 12% are in promoters (-1 kb to +100 bp from TSSs). Notably, some TFs (e.g., CTCF, JUN, JUNB, and NFE2) have motifs enriched in intergenic regions, including enhancers. We inferred 142 cobinding TF pairs and 186 (including 115 completely) tethered binding TF pairs, indicating frequent interactions between TFs and a higher frequency of tethered binding than cobinding. This study provides a large number of previously undocumented motifs and insights into the biological and genomic features of TFBSs.

摘要

转录因子结合位点 (TFBSs) 对于基因调控至关重要,但已知的 TFBSs 数量仍然有限。我们旨在通过开发一种分析 ChIP-seq(染色质免疫沉淀 followed by sequencing)数据的计算管道来发现和描述未知的 TFBSs。将其应用于人类和小鼠的最新 ENCODE ChIP-seq 数据,我们发现使用不可重现发现率作为质量控制标准会导致许多实验被不必要地丢弃。相比之下,ChIP-seq 峰区域中的基序出现次数提供了一个非常有效的标准,即使仅由一个实验重复支持,该标准也是可靠的。总共,我们从 354 个人类 TF 的 1,089 个实验中获得了 2,058 个基序,从 34 个小鼠 TF 的 101 个实验中获得了 163 个基序。其中,487 个基序以前没有报道过。将典型基序映射到人类基因组上,在转录起始位点 (TSS) 周围的 ±2 kb 处显示出高 TFBS 密度,峰值在 -50 bp。平均而言,一个启动子包含 5.7 个 TFBS。然而,70%的 TFBS 位于内含子 (41%) 和基因间区域 (29%),而只有 12%位于启动子 (-1 kb 到 +100 bp 从 TSSs)。值得注意的是,一些 TF(例如 CTCF、JUN、JUNB 和 NFE2)在基因间区域,包括增强子中具有富集的基序。我们推断出 142 对共结合 TF 对和 186 对(包括 115 对完全)连接结合 TF 对,表明 TF 之间存在频繁的相互作用和比共结合更高频率的连接结合。这项研究提供了大量以前未记录的基序,并深入了解了 TFBSs 的生物学和基因组特征。

相似文献

1
Discovering unknown human and mouse transcription factor binding sites and their characteristics from ChIP-seq data.从 ChIP-seq 数据中发现未知的人和小鼠转录因子结合位点及其特征。
Proc Natl Acad Sci U S A. 2021 May 18;118(20). doi: 10.1073/pnas.2026754118.
2
Crunch: integrated processing and modeling of ChIP-seq data in terms of regulatory motifs.Crunch:基于调控基序对 ChIP-seq 数据进行集成处理和建模。
Genome Res. 2019 Jul;29(7):1164-1177. doi: 10.1101/gr.239319.118. Epub 2019 May 28.
3
Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment.基于拓扑基序富集改进ChIP-Seq数据中转录因子结合位点的分析。
BMC Genomics. 2014 Jun 13;15(1):472. doi: 10.1186/1471-2164-15-472.
4
Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors.119 个人类转录因子结合的基因组区域的序列特征和染色质结构。
Genome Res. 2012 Sep;22(9):1798-812. doi: 10.1101/gr.139105.112.
5
LASAGNA: a novel algorithm for transcription factor binding site alignment.LASAGNA:一种用于转录因子结合位点比对的新算法。
BMC Bioinformatics. 2013 Mar 24;14:108. doi: 10.1186/1471-2105-14-108.
6
UniBind: maps of high-confidence direct TF-DNA interactions across nine species.UniBind:九个物种中高可信度直接 TF-DNA 相互作用的图谱。
BMC Genomics. 2021 Jun 26;22(1):482. doi: 10.1186/s12864-021-07760-6.
7
7C: Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs.通过 CTCF 基序的 ChIP-seq 相关性进行计算染色体构象捕获。
BMC Genomics. 2019 Oct 25;20(1):777. doi: 10.1186/s12864-019-6088-0.
8
The next generation of transcription factor binding site prediction.下一代转录因子结合位点预测。
PLoS Comput Biol. 2013;9(9):e1003214. doi: 10.1371/journal.pcbi.1003214. Epub 2013 Sep 5.
9
Statistics of protein-DNA binding and the total number of binding sites for a transcription factor in the mammalian genome.蛋白质-DNA 结合的统计数据和哺乳动物基因组中转录因子的总结合位点数量。
BMC Genomics. 2010 Feb 10;11 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2164-11-S1-S12.
10
Differential motif enrichment analysis of paired ChIP-seq experiments.配对染色质免疫沉淀测序(ChIP-seq)实验的差异基序富集分析
BMC Genomics. 2014 Sep 2;15(1):752. doi: 10.1186/1471-2164-15-752.

引用本文的文献

1
Positional distribution of transcription factor binding sites in the human genome.人类基因组中转录因子结合位点的位置分布。
PLoS One. 2025 Jul 30;20(7):e0329226. doi: 10.1371/journal.pone.0329226. eCollection 2025.
2
CRISPR screening reveals ZNF217 as a vulnerability in high-risk B-cell acute lymphoblastic leukemia.CRISPR筛选揭示ZNF217是高危B细胞急性淋巴细胞白血病的一个脆弱靶点。
Theranostics. 2025 Feb 18;15(8):3234-3256. doi: 10.7150/thno.100295. eCollection 2025.
3
p85β acts as a transcription cofactor and cooperates with BCLAF1 in the nucleus.p85β作为一种转录辅因子,在细胞核中与BCLAF1协同作用。
Nat Commun. 2025 Feb 27;16(1):2042. doi: 10.1038/s41467-025-56532-3.
4
Genomic background sequences systematically outperform synthetic ones in de novo motif discovery for ChIP-seq data.在ChIP-seq数据的从头基序发现中,基因组背景序列在系统上优于合成序列。
NAR Genom Bioinform. 2024 Jul 27;6(3):lqae090. doi: 10.1093/nargab/lqae090. eCollection 2024 Sep.
5
Stroke-associated intergenic variants modulate a human FOXF2 transcriptional enhancer.与中风相关的基因间变异调控人类 FOXF2 转录增强子。
Proc Natl Acad Sci U S A. 2022 Aug 30;119(35):e2121333119. doi: 10.1073/pnas.2121333119. Epub 2022 Aug 22.
6
Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis.提出核苷酸独立和相互依赖影响的基序模型与拟南芥中高亲和力和低亲和力转录因子结合位点有关。
Front Plant Sci. 2022 Jul 28;13:938545. doi: 10.3389/fpls.2022.938545. eCollection 2022.

本文引用的文献

1
The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions.The BioGRID 数据库:一个经过精心整理的生物医学资源,包含蛋白质、遗传和化学相互作用。
Protein Sci. 2021 Jan;30(1):187-200. doi: 10.1002/pro.3978. Epub 2020 Nov 23.
2
Global reference mapping of human transcription factor footprints.人类转录因子足迹的全球参考图谱绘制。
Nature. 2020 Jul;583(7818):729-736. doi: 10.1038/s41586-020-2528-x. Epub 2020 Jul 29.
3
Expanded encyclopaedias of DNA elements in the human and mouse genomes.人类和小鼠基因组中 DNA 元件的扩展百科全书。
Nature. 2020 Jul;583(7818):699-710. doi: 10.1038/s41586-020-2493-4. Epub 2020 Jul 29.
4
Occupancy maps of 208 chromatin-associated proteins in one human cell type.一种人类细胞类型中 208 种染色质相关蛋白的占据图谱。
Nature. 2020 Jul;583(7818):720-728. doi: 10.1038/s41586-020-2023-4. Epub 2020 Jul 29.
5
Functional annotation of human long noncoding RNAs via molecular phenotyping.通过分子表型分析对人类长非编码 RNA 进行功能注释。
Genome Res. 2020 Jul;30(7):1060-1072. doi: 10.1101/gr.254219.119. Epub 2020 Jul 27.
6
Gene regulatory network inference resources: A practical overview.基因调控网络推断资源:实用概述。
Biochim Biophys Acta Gene Regul Mech. 2020 Jun;1863(6):194430. doi: 10.1016/j.bbagrm.2019.194430. Epub 2019 Oct 31.
7
Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion.人类免疫细胞发育和肿瘤内 T 细胞耗竭的大规模平行单细胞染色质景观。
Nat Biotechnol. 2019 Aug;37(8):925-936. doi: 10.1038/s41587-019-0206-z. Epub 2019 Aug 2.
8
The ENCODE Blacklist: Identification of Problematic Regions of the Genome.ENCODE 黑名单:基因组中问题区域的鉴定。
Sci Rep. 2019 Jun 27;9(1):9354. doi: 10.1038/s41598-019-45839-z.
9
The Human Transcription Factors.人类转录因子。
Cell. 2018 Oct 4;175(2):598-599. doi: 10.1016/j.cell.2018.09.045.
10
The Encyclopedia of DNA elements (ENCODE): data portal update.《DNA 元件百科全书》(ENCODE):数据门户更新。
Nucleic Acids Res. 2018 Jan 4;46(D1):D794-D801. doi: 10.1093/nar/gkx1081.