转录因子结合位点的变构基序。

Variable structure motifs for transcription factor binding sites.

机构信息

MRC Biostatistics Unit, Institute of Public Health, Forvie Site, Cambridge, CB2 0SR, UK.

出版信息

BMC Genomics. 2010 Jan 14;11:30. doi: 10.1186/1471-2164-11-30.

DOI:10.1186/1471-2164-11-30

PMID:20074339

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2824720/

Abstract

BACKGROUND

Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of dependencies between the bases in the binding sites. However, some transcription factors are known to exhibit some flexibility and bind to DNA in more than one possible physical configuration. In some cases this variation is known to affect the function of binding sites. With the increasing volume of ChIP-seq data available it is now possible to investigate models that incorporate this flexibility. Previous work on variable length models has been constrained by: a focus on specific zinc finger proteins in yeast using restrictive models; a reliance on hand-crafted models for just one transcription factor at a time; and a lack of evaluation on realistically sized data sets.

RESULTS

We re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance.

CONCLUSIONS

We have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1.

摘要

背景

经典的 DNA-转录因子结合位点（TFBS）模型是基于相对较少的已知实例，并使用位置权重矩阵（PWMs）将其视为固定长度的位点。已经提出了该模型的各种扩展，其中大多数考虑了结合位点中碱基之间的依赖性。然而，已知一些转录因子表现出一定的灵活性，并以多种可能的物理构象结合 DNA。在某些情况下，这种变化已知会影响结合位点的功能。随着 ChIP-seq 数据量的增加，现在可以研究纳入这种灵活性的模型。以前关于可变长度模型的工作受到以下因素的限制：在酵母中使用限制性模型专注于特定的锌指蛋白；一次依赖于手工制作的模型来寻找一个转录因子；以及缺乏对真实大小数据集的评估。

结果

我们重新分析了 TRANSFAC 数据库中的结合位点，并找到了一些令人信服的例子，在这些例子中，我们的新可变长度模型提供了更好的拟合。我们使用一种新的 motif 搜索算法分析了几个 ChIP-seq 数据集，并将结果与最好的标准 PWM 查找器之一和最近开发的用于寻找可变结构 motif 的替代方法进行了比较。所有方法在保留交叉验证测试中表现相当。为 p53、Stat5a 和 Stat5b 找到了可变结构的已知 motif。此外，我们的方法还为 Sp1 恢复了一个现有的 PWM 的新的通用版本，该版本允许可变长度结合。该 motif 提高了分类性能。

结论

我们提出了一种新的可变长度 DNA 结合位点的间隙 PWM 模型，该模型既不太严格也不过度参数化。我们与现有工具的比较表明，平均而言，它的预测准确性并不优于现有方法。然而，它确实提供了更具可解释性的可变结构 motif 模型，适合后续的结构研究。据我们所知，我们是第一个将可变长度 motif 模型应用于真核生物 ChIP-seq 数据集的人，因此也是第一个在该领域展示其价值的人。结果包括一个新的普遍转录因子 Sp1 的 motif。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0cc/2824720/f6186b74bd9f/1471-2164-11-30-1.jpg

相似文献

Variable structure motifs for transcription factor binding sites.转录因子结合位点的变构基序。

BMC Genomics. 2010 Jan 14;11:30. doi: 10.1186/1471-2164-11-30.

Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data.从ChIP-seq数据推断DNA结合位点的基序内依赖性。

BMC Bioinformatics. 2015 Nov 9;16:375. doi: 10.1186/s12859-015-0797-4.

Tree-based position weight matrix approach to model transcription factor binding site profiles.基于树的位置权重矩阵方法来模拟转录因子结合位点图谱。

PLoS One. 2011;6(9):e24210. doi: 10.1371/journal.pone.0024210. Epub 2011 Sep 2.

Creating PWMs of transcription factors using 3D structure-based computation of protein-DNA free binding energies.使用基于 3D 结构的蛋白-DNA 自由结合能计算来创建转录因子的 PWMs。

BMC Bioinformatics. 2010 May 3;11:225. doi: 10.1186/1471-2105-11-225.

Optimized position weight matrices in prediction of novel putative binding sites for transcription factors in the Drosophila melanogaster genome.优化位置权重矩阵以预测果蝇基因组中转录因子的新型潜在结合位点。

PLoS One. 2013 Aug 6;8(8):e68712. doi: 10.1371/journal.pone.0068712. Print 2013.

LASAGNA: a novel algorithm for transcription factor binding site alignment.LASAGNA：一种用于转录因子结合位点比对的新算法。

BMC Bioinformatics. 2013 Mar 24;14:108. doi: 10.1186/1471-2105-14-108.

The next generation of transcription factor binding site prediction.下一代转录因子结合位点预测。

PLoS Comput Biol. 2013;9(9):e1003214. doi: 10.1371/journal.pcbi.1003214. Epub 2013 Sep 5.

EMQIT: a machine learning approach for energy based PWM matrix quality improvement.EMQIT：一种基于能量的脉宽调制矩阵质量改进的机器学习方法。

Biol Direct. 2017 Aug 1;12(1):17. doi: 10.1186/s13062-017-0189-y.

Improved benchmarks for computational motif discovery.用于计算基序发现的改进基准。

BMC Bioinformatics. 2007 Jun 8;8:193. doi: 10.1186/1471-2105-8-193.

Differential motif enrichment analysis of paired ChIP-seq experiments.配对染色质免疫沉淀测序（ChIP-seq）实验的差异基序富集分析

BMC Genomics. 2014 Sep 2;15(1):752. doi: 10.1186/1471-2164-15-752.

引用本文的文献

The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes.人类和拟南芥基因组中转录因子结合位点预测工具的评估

BMC Bioinformatics. 2024 Dec 2;25(1):371. doi: 10.1186/s12859-024-05995-0.

JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles.JASPAR 2024：转录因子结合谱开放获取数据库的 20 周年纪念

Nucleic Acids Res. 2024 Jan 5;52(D1):D174-D182. doi: 10.1093/nar/gkad1059.

Whole-genome functional characterization of RE1 silencers using a modified massively parallel reporter assay.使用改良的大规模平行报告基因检测法对RE1沉默子进行全基因组功能表征。

Cell Genom. 2022 Dec 16;3(1):100234. doi: 10.1016/j.xgen.2022.100234. eCollection 2023 Jan 11.

JASPAR 2020: update of the open-access database of transcription factor binding profiles.JASPAR 2020：转录因子结合谱开放获取数据库的更新。

Nucleic Acids Res. 2020 Jan 8;48(D1):D87-D92. doi: 10.1093/nar/gkz1001.

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.巴塞特：利用深度卷积神经网络学习可及基因组的调控密码。

Genome Res. 2016 Jul;26(7):990-9. doi: 10.1101/gr.200535.115. Epub 2016 May 3.

Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells.用于解析真核细胞中转录因子结合和转录调控的基因组序列基序分析

Front Genet. 2016 Feb 23;7:24. doi: 10.3389/fgene.2016.00024. eCollection 2016.

An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system.螺旋-转角-螺旋：具有通用坐标系的DNA复合物的亲和结构数据库。

BMC Bioinformatics. 2015 Nov 19;16:390. doi: 10.1186/s12859-015-0819-2.

The next generation of transcription factor binding site prediction.下一代转录因子结合位点预测。

PLoS Comput Biol. 2013;9(9):e1003214. doi: 10.1371/journal.pcbi.1003214. Epub 2013 Sep 5.

A probabilistic approach to learn chromatin architecture and accurate inference of the NF-κB/RelA regulatory network using ChIP-Seq.一种基于概率的方法，用于学习染色质结构，并使用 ChIP-Seq 进行 NF-κB/RelA 调控网络的精确推断。

Nucleic Acids Res. 2013 Aug;41(15):7240-59. doi: 10.1093/nar/gkt493. Epub 2013 Jun 14.

Effect of positional dependence and alignment strategy on modeling transcription factor binding sites.

BMC Res Notes. 2012 Jul 2;5:340. doi: 10.1186/1756-0500-5-340.

本文引用的文献

Diversity and complexity in DNA recognition by transcription factors.转录因子对DNA识别的多样性与复杂性

Science. 2009 Jun 26;324(5935):1720-3. doi: 10.1126/science.1162327. Epub 2009 May 14.

Functional diversity for REST (NRSF) is defined by in vivo binding affinity hierarchies at the DNA sequence level.REST（NRSF）的功能多样性是由DNA序列水平上的体内结合亲和力层次结构定义的。

Genome Res. 2009 Jun;19(6):994-1005. doi: 10.1101/gr.089086.108. Epub 2009 Apr 28.

The p53HMM algorithm: using profile hidden markov models to detect p53-responsive genes.p53HMM算法：利用轮廓隐马尔可夫模型检测p53反应基因。

BMC Bioinformatics. 2009 Apr 20;10:111. doi: 10.1186/1471-2105-10-111.

Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors.用于全面表征转录因子DNA结合特异性的通用蛋白质结合微阵列。

Nat Protoc. 2009;4(3):393-411. doi: 10.1038/nprot.2008.195.

Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data.基于染色质免疫沉淀测序（ChIP-Seq）数据的转录因子结合位点全基因组分析。

Nat Methods. 2008 Sep;5(9):829-34. doi: 10.1038/nmeth.1246.

Methylation and deamination of CpGs generate p53-binding sites on a genomic scale.CpG的甲基化和脱氨基作用在基因组范围内产生p53结合位点。

Trends Genet. 2009 Feb;25(2):63-6. doi: 10.1016/j.tig.2008.11.005. Epub 2008 Dec 26.

UniPROBE: an online database of protein binding microarray data on protein-DNA interactions.UniPROBE：一个关于蛋白质与DNA相互作用的蛋白质结合微阵列数据在线数据库。

Nucleic Acids Res. 2009 Jan;37(Database issue):D77-82. doi: 10.1093/nar/gkn660. Epub 2008 Oct 8.

Priming for T helper type 2 differentiation by interleukin 2-mediated induction of interleukin 4 receptor alpha-chain expression.白细胞介素2介导白细胞介素4受体α链表达诱导促进2型辅助性T细胞分化

Nat Immunol. 2008 Nov;9(11):1288-96. doi: 10.1038/ni.1656. Epub 2008 Sep 28.

A feature-based approach to modeling protein-DNA interactions.一种基于特征的蛋白质 - DNA 相互作用建模方法。

PLoS Comput Biol. 2008 Aug 22;4(8):e1000154. doi: 10.1371/journal.pcbi.1000154.

Discovering sequence motifs with arbitrary insertions and deletions.发现带有任意插入和缺失的序列基序。

PLoS Comput Biol. 2008 May 9;4(4):e1000071. doi: 10.1371/journal.pcbi.1000071.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

转录因子结合位点的变构基序。

Variable structure motifs for transcription factor binding sites.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献