• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关于人类绝缘子蛋白CTCF基序内依赖性的价值

On the value of intra-motif dependencies of human insulator protein CTCF.

作者信息

Eggeling Ralf, Gohr André, Keilwagen Jens, Mohr Michaela, Posch Stefan, Smith Andrew D, Grosse Ivo

机构信息

Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle/Saale, Germany.

Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, Quedlinburg, Germany ; Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland OT Gatersleben, Germany.

出版信息

PLoS One. 2014 Jan 22;9(1):e85629. doi: 10.1371/journal.pone.0085629. eCollection 2014.

DOI:10.1371/journal.pone.0085629
PMID:24465627
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3899044/
Abstract

The binding affinity of DNA-binding proteins such as transcription factors is mainly determined by the base composition of the corresponding binding site on the DNA strand. Most proteins do not bind only a single sequence, but rather a set of sequences, which may be modeled by a sequence motif. Algorithms for de novo motif discovery differ in their promoter models, learning approaches, and other aspects, but typically use the statistically simple position weight matrix model for the motif, which assumes statistical independence among all nucleotides. However, there is no clear justification for that assumption, leading to an ongoing debate about the importance of modeling dependencies between nucleotides within binding sites. In the past, modeling statistical dependencies within binding sites has been hampered by the problem of limited data. With the rise of high-throughput technologies such as ChIP-seq, this situation has now changed, making it possible to make use of statistical dependencies effectively. In this work, we investigate the presence of statistical dependencies in binding sites of the human enhancer-blocking insulator protein CTCF by using the recently developed model class of inhomogeneous parsimonious Markov models, which is capable of modeling complex dependencies while avoiding overfitting. These findings lead to a more detailed characterization of the CTCF binding motif, which is only poorly represented by independent nucleotide frequencies at several positions, predominantly at the 3' end.

摘要

诸如转录因子等DNA结合蛋白的结合亲和力主要由DNA链上相应结合位点的碱基组成决定。大多数蛋白质并非只结合单一序列,而是结合一组序列,这组序列可用序列基序来建模。从头基序发现算法在其启动子模型、学习方法及其他方面存在差异,但通常使用统计上简单的位置权重矩阵模型来表示基序,该模型假定所有核苷酸之间具有统计独立性。然而,这种假设并无明确的依据,这引发了关于对结合位点内核苷酸之间的相关性进行建模的重要性的持续争论。过去,由于数据有限的问题,对结合位点内的统计相关性进行建模受到了阻碍。随着ChIP-seq等高通量技术的兴起,这种情况现在已经改变,使得有效利用统计相关性成为可能。在这项工作中,我们通过使用最近开发的非齐次简约马尔可夫模型类别来研究人类增强子阻断绝缘子蛋白CTCF结合位点中统计相关性的存在,该模型能够在避免过度拟合的同时对复杂的相关性进行建模。这些发现导致对CTCF结合基序有了更详细的表征,在几个位置,主要是在3'端,独立核苷酸频率对其的表征较差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/21274f3725f0/pone.0085629.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/d90ac32bc5bc/pone.0085629.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/2aa9fc2decd4/pone.0085629.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/a66664a93bce/pone.0085629.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/524cf82a1bc6/pone.0085629.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/4ee9ab68a9db/pone.0085629.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/761d1134db56/pone.0085629.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/bc1346a51070/pone.0085629.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/21274f3725f0/pone.0085629.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/d90ac32bc5bc/pone.0085629.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/2aa9fc2decd4/pone.0085629.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/a66664a93bce/pone.0085629.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/524cf82a1bc6/pone.0085629.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/4ee9ab68a9db/pone.0085629.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/761d1134db56/pone.0085629.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/bc1346a51070/pone.0085629.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/947a/3899044/21274f3725f0/pone.0085629.g008.jpg

相似文献

1
On the value of intra-motif dependencies of human insulator protein CTCF.关于人类绝缘子蛋白CTCF基序内依赖性的价值
PLoS One. 2014 Jan 22;9(1):e85629. doi: 10.1371/journal.pone.0085629. eCollection 2014.
2
Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data.从ChIP-seq数据推断DNA结合位点的基序内依赖性。
BMC Bioinformatics. 2015 Nov 9;16:375. doi: 10.1186/s12859-015-0797-4.
3
CTCF genomic binding sites in Drosophila and the organisation of the bithorax complex.果蝇中CTCF的基因组结合位点与双胸复合体的组织
PLoS Genet. 2007 Jul;3(7):e112. doi: 10.1371/journal.pgen.0030112.
4
Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences.在预测核苷酸序列中的基序方面,贝叶斯马尔可夫模型始终优于位置权重矩阵。
Nucleic Acids Res. 2016 Jul 27;44(13):6055-69. doi: 10.1093/nar/gkw521. Epub 2016 Jun 9.
5
Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains.对染色质屏障区域中绝缘子结合蛋白CTCF的全局分析揭示了活性结构域和抑制结构域的划分。
Genome Res. 2009 Jan;19(1):24-32. doi: 10.1101/gr.082800.108. Epub 2008 Dec 3.
6
The murine IgH locus contains a distinct DNA sequence motif for the chromatin regulatory factor CTCF.小鼠 IgH 基因座含有一个独特的 DNA 序列基序,用于染色质调节因子 CTCF。
J Biol Chem. 2019 Sep 13;294(37):13580-13592. doi: 10.1074/jbc.RA118.007348. Epub 2019 Jul 8.
7
The characteristics of CTCF binding sequences contribute to enhancer blocking activity.CTCF 结合序列的特征有助于增强子阻断活性。
Nucleic Acids Res. 2024 Sep 23;52(17):10180-10193. doi: 10.1093/nar/gkae666.
8
Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data.从ChIP-Seq数据中进行全基因组范围内体内蛋白质-DNA结合位点的鉴定。
Nucleic Acids Res. 2008 Sep;36(16):5221-31. doi: 10.1093/nar/gkn488. Epub 2008 Aug 6.
9
Combining phylogenetic footprinting with motif models incorporating intra-motif dependencies.将系统发育足迹法与纳入基序内依赖性的基序模型相结合。
BMC Bioinformatics. 2017 Mar 1;18(1):141. doi: 10.1186/s12859-017-1495-1.
10
Motif oriented high-resolution analysis of ChIP-seq data reveals the topological order of CTCF and cohesin proteins on DNA.基于基序的ChIP-seq数据高分辨率分析揭示了CTCF和黏连蛋白在DNA上的拓扑顺序。
BMC Genomics. 2016 Aug 15;17(1):637. doi: 10.1186/s12864-016-2940-7.

引用本文的文献

1
Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites.位置权重矩阵还是非循环概率有限自动机:使用哪种模型?一种用于预测转录因子结合位点的推断决策规则。
Genet Mol Biol. 2024 Jan 19;46(4):e20230048. doi: 10.1590/1678-4685-GMB-2023-0048. eCollection 2024.
2
Position-specific evolution in transcription factor binding sites, and a fast likelihood calculation for the F81 model.转录因子结合位点的位置特异性进化以及F81模型的快速似然计算。
R Soc Open Sci. 2024 Jan 24;11(1):231088. doi: 10.1098/rsos.231088. eCollection 2024 Jan.
3

本文引用的文献

1
A genome-wide map of CTCF multivalency redefines the CTCF code.全基因组图谱解析 CTCF 多价态,重新定义 CTCF 密码。
Cell Rep. 2013 May 30;3(5):1678-1689. doi: 10.1016/j.celrep.2013.04.024. Epub 2013 May 23.
2
Evaluation of methods for modeling transcription factor sequence specificity.转录因子序列特异性建模方法评估。
Nat Biotechnol. 2013 Feb;31(2):126-34. doi: 10.1038/nbt.2486. Epub 2013 Jan 27.
3
Widespread plasticity in CTCF occupancy linked to DNA methylation.CTCF 占据与 DNA 甲基化广泛相关的可塑性。
A survey on algorithms to characterize transcription factor binding sites.
一种用于刻画转录因子结合位点的算法研究综述。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad156.
4
A universal framework for detecting -regulatory diversity in DNA regions.用于检测 DNA 区域中的调控多样性的通用框架。
Genome Res. 2021 Sep;31(9):1646-1662. doi: 10.1101/gr.274563.120. Epub 2021 Jul 19.
5
Resolving diverse protein-DNA footprints from exonuclease-based ChIP experiments.从基于核酸外切酶的 ChIP 实验中解析多样化的蛋白质-DNA 足迹。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i367-i375. doi: 10.1093/bioinformatics/btab274.
6
Allele specific chromatin signals, 3D interactions, and motif predictions for immune and B cell related diseases.免疫和 B 细胞相关疾病的等位基因特异性染色质信号、3D 相互作用和基序预测。
Sci Rep. 2019 Feb 25;9(1):2695. doi: 10.1038/s41598-019-39633-0.
7
Disentangling transcription factor binding site complexity.解析转录因子结合位点的复杂性。
Nucleic Acids Res. 2018 Nov 16;46(20):e121. doi: 10.1093/nar/gky683.
8
CircularLogo: A lightweight web application to visualize intra-motif dependencies.圆形徽标:一个用于可视化基序内依赖性的轻量级Web应用程序。
BMC Bioinformatics. 2017 May 22;18(1):269. doi: 10.1186/s12859-017-1680-2.
9
InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites.InMoDe:用于学习和可视化DNA结合位点基序内依赖性的工具。
Bioinformatics. 2017 Feb 15;33(4):580-582. doi: 10.1093/bioinformatics/btw689.
10
RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data.RCK:基于RNAcompete数据准确高效地推断基于序列和结构的蛋白质-RNA结合模型。
Bioinformatics. 2016 Jun 15;32(12):i351-i359. doi: 10.1093/bioinformatics/btw259.
Genome Res. 2012 Sep;22(9):1680-8. doi: 10.1101/gr.136101.111.
4
Jury remains out on simple models of transcription factor specificity.对于转录因子特异性的简单模型尚无定论。
Nat Biotechnol. 2011 Jun 7;29(6):483-4. doi: 10.1038/nbt.1892.
5
Quantitative analysis demonstrates most transcription factors require only simple models of specificity.定量分析表明,大多数转录因子只需要简单的特异性模型。
Nat Biotechnol. 2011 Jun 7;29(6):480-3. doi: 10.1038/nbt.1893.
6
De-novo discovery of differentially abundant transcription factor binding sites including their positional preference.从头发现差异丰度转录因子结合位点,包括其位置偏好。
PLoS Comput Biol. 2011 Feb 10;7(2):e1001070. doi: 10.1371/journal.pcbi.1001070.
7
High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells.高分辨率全基因组活体内足迹分析鉴定人细胞中多样化的转录因子。
Genome Res. 2011 Mar;21(3):456-64. doi: 10.1101/gr.112656.110. Epub 2010 Nov 24.
8
GREAT improves functional interpretation of cis-regulatory regions.GREAT 提高了顺式调控区域的功能解释。
Nat Biotechnol. 2010 May;28(5):495-501. doi: 10.1038/nbt.1630. Epub 2010 May 2.
9
Diversity and complexity in DNA recognition by transcription factors.转录因子对DNA识别的多样性与复杂性
Science. 2009 Jun 26;324(5935):1720-3. doi: 10.1126/science.1162327. Epub 2009 May 14.
10
F-Seq: a feature density estimator for high-throughput sequence tags.F-Seq:一种用于高通量序列标签的特征密度估计器。
Bioinformatics. 2008 Nov 1;24(21):2537-8. doi: 10.1093/bioinformatics/btn480. Epub 2008 Sep 10.