• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

二核苷酸权重矩阵用于预测转录因子结合位点:位置权重矩阵的推广。

Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix.

机构信息

The Institute of Mathematical Sciences, Chennai, Tamil Nadu, India.

出版信息

PLoS One. 2010 Mar 22;5(3):e9722. doi: 10.1371/journal.pone.0009722.

DOI:10.1371/journal.pone.0009722
PMID:20339533
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2842295/
Abstract

BACKGROUND

Identifying transcription factor binding sites (TFBS) in silico is key in understanding gene regulation. TFBS are string patterns that exhibit some variability, commonly modelled as "position weight matrices" (PWMs). Though convenient, the PWM has significant limitations, in particular the assumed independence of positions within the binding motif; and predictions based on PWMs are usually not very specific to known functional sites. Analysis here on binding sites in yeast suggests that correlation of dinucleotides is not limited to near-neighbours, but can extend over considerable gaps.

METHODOLOGY/PRINCIPAL FINDINGS: I describe a straightforward generalization of the PWM model, that considers frequencies of dinucleotides instead of individual nucleotides. Unlike previous efforts, this method considers all dinucleotides within an extended binding region, and does not make an attempt to determine a priori the significance of particular dinucleotide correlations. I describe how to use a "dinucleotide weight matrix" (DWM) to predict binding sites, dealing in particular with the complication that its entries are not independent probabilities. Benchmarks show, for many factors, a dramatic improvement over PWMs in precision of predicting known targets. In most cases, significant further improvement arises by extending the commonly defined "core motifs" by about 10 bp on either side. Though this flanking sequence shows no strong motif at the nucleotide level, the predictive power of the dinucleotide model suggests that the "signature" in DNA sequence of protein-binding affinity extends beyond the core protein-DNA contact region.

CONCLUSION/SIGNIFICANCE: While computationally more demanding and slower than PWM-based approaches, this dinucleotide method is straightforward, both conceptually and in implementation, and can serve as a basis for future improvements.

摘要

背景

在计算机中识别转录因子结合位点(TFBS)是理解基因调控的关键。TFBS 是表现出一定可变性的字符串模式,通常被建模为“位置权重矩阵”(PWMs)。尽管 PWMs 很方便,但它有很大的局限性,特别是绑定基序内位置的假设独立性;并且基于 PWMs 的预测通常对已知功能位点不是很具体。对酵母中结合位点的分析表明,二核苷酸的相关性不仅限于近邻,而是可以延伸到相当大的间隙。

方法/主要发现:我描述了 PWM 模型的一种直接推广,该模型考虑了二核苷酸的频率而不是单个核苷酸。与以前的努力不同,这种方法考虑了扩展的绑定区域内的所有二核苷酸,并且不尝试预先确定特定二核苷酸相关性的重要性。我描述了如何使用“二核苷酸权重矩阵”(DWM)来预测结合位点,特别是处理其条目不是独立概率的复杂性。基准测试表明,对于许多因素,与 PWM 相比,在预测已知目标的精度方面有了显著提高。在大多数情况下,通过在通常定义的“核心基序”的任一侧扩展约 10 bp,可以进一步显著提高。尽管这种侧翼序列在核苷酸水平上没有强烈的基序,但二核苷酸模型的预测能力表明,蛋白质结合亲和力在 DNA 序列中的“特征”超出了核心蛋白-DNA 接触区域。

结论/意义:虽然这种二核苷酸方法在计算上比基于 PWM 的方法要求更高且更慢,但它在概念上和实现上都很简单,可以作为未来改进的基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238d/2842295/b04be7545d66/pone.0009722.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238d/2842295/3fd24bb26662/pone.0009722.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238d/2842295/bdc075cae0c7/pone.0009722.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238d/2842295/53eab7aaff83/pone.0009722.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238d/2842295/5b0906252118/pone.0009722.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238d/2842295/97b74a2b444d/pone.0009722.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238d/2842295/b04be7545d66/pone.0009722.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238d/2842295/3fd24bb26662/pone.0009722.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238d/2842295/bdc075cae0c7/pone.0009722.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238d/2842295/53eab7aaff83/pone.0009722.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238d/2842295/5b0906252118/pone.0009722.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238d/2842295/97b74a2b444d/pone.0009722.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/238d/2842295/b04be7545d66/pone.0009722.g006.jpg

相似文献

1
Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix.二核苷酸权重矩阵用于预测转录因子结合位点:位置权重矩阵的推广。
PLoS One. 2010 Mar 22;5(3):e9722. doi: 10.1371/journal.pone.0009722.
2
A general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites.一种通用的成对相互作用模型能够准确描述体内转录因子结合位点。
PLoS One. 2014 Jun 13;9(6):e99015. doi: 10.1371/journal.pone.0099015. eCollection 2014.
3
Optimized position weight matrices in prediction of novel putative binding sites for transcription factors in the Drosophila melanogaster genome.优化位置权重矩阵以预测果蝇基因组中转录因子的新型潜在结合位点。
PLoS One. 2013 Aug 6;8(8):e68712. doi: 10.1371/journal.pone.0068712. Print 2013.
4
A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.一种基于DNA形状的调控评分提高了基于位置权重矩阵对转录因子结合位点的识别。
Bioinformatics. 2015 Nov 1;31(21):3445-50. doi: 10.1093/bioinformatics/btv391. Epub 2015 Jun 30.
5
Increasing coverage of transcription factor position weight matrices through domain-level homology.通过域级同源性提高转录因子位置权重矩阵的覆盖率。
PLoS One. 2012;7(8):e42779. doi: 10.1371/journal.pone.0042779. Epub 2012 Aug 27.
6
DISPARE: DIScriminative PAttern REfinement for Position Weight Matrices.DISPARE:位置权重矩阵的判别式模式细化。
BMC Bioinformatics. 2009 Nov 26;10:388. doi: 10.1186/1471-2105-10-388.
7
MARZ: an algorithm to combinatorially analyze gapped n-mer models of transcription factor binding.MARZ:一种用于组合分析转录因子结合的带间隙n元模型的算法。
BMC Bioinformatics. 2015 Jan 31;16:30. doi: 10.1186/s12859-014-0446-3.
8
Creating PWMs of transcription factors using 3D structure-based computation of protein-DNA free binding energies.使用基于 3D 结构的蛋白-DNA 自由结合能计算来创建转录因子的 PWMs。
BMC Bioinformatics. 2010 May 3;11:225. doi: 10.1186/1471-2105-11-225.
9
EMQIT: a machine learning approach for energy based PWM matrix quality improvement.EMQIT:一种基于能量的脉宽调制矩阵质量改进的机器学习方法。
Biol Direct. 2017 Aug 1;12(1):17. doi: 10.1186/s13062-017-0189-y.
10
dipwmsearch: a Python package for searching di-PWM motifs.dipwmsearch:一个用于搜索双 PWM 基序的 Python 包。
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad141.

引用本文的文献

1
DNA-Protein Binding is Dominated by Short Anchoring Elements.DNA与蛋白质的结合主要由短锚定元件主导。
Adv Sci (Weinh). 2025 May;12(19):e2414823. doi: 10.1002/advs.202414823. Epub 2025 Mar 26.
2
Cardiovascular disease-associated non-coding variants disrupt GATA4-DNA binding and regulatory functions.心血管疾病相关的非编码变异破坏GATA4与DNA的结合及调控功能。
HGG Adv. 2025 Apr 10;6(2):100415. doi: 10.1016/j.xhgg.2025.100415. Epub 2025 Feb 12.
3
Fundamentals for predicting transcriptional regulations from DNA sequence patterns.

本文引用的文献

1
PhyloGibbs-MP: module prediction and discriminative motif-finding by Gibbs sampling.PhyloGibbs-MP:通过吉布斯采样进行模块预测和判别基序查找。
PLoS Comput Biol. 2008 Aug 29;4(8):e1000156. doi: 10.1371/journal.pcbi.1000156.
2
A feature-based approach to modeling protein-DNA interactions.一种基于特征的蛋白质 - DNA 相互作用建模方法。
PLoS Comput Biol. 2008 Aug 22;4(8):e1000154. doi: 10.1371/journal.pcbi.1000154.
3
A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system.
从 DNA 序列模式预测转录调控的基础。
J Hum Genet. 2024 Oct;69(10):499-504. doi: 10.1038/s10038-024-01256-3. Epub 2024 May 10.
4
Decoding Non-coding Variants: Recent Approaches to Studying Their Role in Gene Regulation and Human Diseases.解码非编码变异:研究其在基因调控和人类疾病中作用的最新方法。
Front Biosci (Schol Ed). 2024 Mar 1;16(1):4. doi: 10.31083/j.fbs1601004.
5
Position-specific evolution in transcription factor binding sites, and a fast likelihood calculation for the F81 model.转录因子结合位点的位置特异性进化以及F81模型的快速似然计算。
R Soc Open Sci. 2024 Jan 24;11(1):231088. doi: 10.1098/rsos.231088. eCollection 2024 Jan.
6
Identifying promoter sequence architectures via a chunking-based algorithm using non-negative matrix factorisation.通过基于分块的算法和非负矩阵分解来识别启动子序列结构。
PLoS Comput Biol. 2023 Nov 20;19(11):e1011491. doi: 10.1371/journal.pcbi.1011491. eCollection 2023 Nov.
7
Prioritizing cardiovascular disease-associated variants altering NKX2-5 and TBX5 binding through an integrative computational approach.通过综合计算方法对改变NKX2-5和TBX5结合的心血管疾病相关变异进行优先级排序。
J Biol Chem. 2023 Dec;299(12):105423. doi: 10.1016/j.jbc.2023.105423. Epub 2023 Nov 4.
8
Investigating the sequence landscape in the initiator core promoter element using an enhanced MARZ algorithm.利用增强型 MARZ 算法研究启动子核心启动子元件中的序列景观。
PeerJ. 2023 Jun 22;11:e15597. doi: 10.7717/peerj.15597. eCollection 2023.
9
A survey on algorithms to characterize transcription factor binding sites.一种用于刻画转录因子结合位点的算法研究综述。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad156.
10
Model-guided engineering of DNA sequences with predictable site-specific recombination rates.基于模型的 DNA 序列工程设计,可预测特定位点的重组率。
Nat Commun. 2022 Jul 20;13(1):4152. doi: 10.1038/s41467-022-31538-3.
通过细菌单杂交系统对调控果蝇体节形成的因子进行系统表征。
Nucleic Acids Res. 2008 May;36(8):2547-60. doi: 10.1093/nar/gkn048. Epub 2008 Mar 10.
4
Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm.转录因子与果蝇胚盘内数千个活跃和非活跃区域结合。
PLoS Biol. 2008 Feb;6(2):e27. doi: 10.1371/journal.pbio.0060027.
5
Use of an evolutionary model to provide evidence for a wide heterogeneity of required affinities between transcription factors and their binding sites in yeast.使用进化模型为酵母中转录因子与其结合位点之间所需亲和力的广泛异质性提供证据。
Pac Symp Biocomput. 2008:489-500.
6
REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila.REDfly 2.0:果蝇顺式调控模块和转录因子结合位点的综合数据库。
Nucleic Acids Res. 2008 Jan;36(Database issue):D594-8. doi: 10.1093/nar/gkm876. Epub 2007 Nov 26.
7
Nearest-neighbor non-additivity versus long-range non-additivity in TATA-box structure and its implications for TBP-binding mechanism.TATA 盒结构中最近邻非加和性与长程非加和性及其对 TBP 结合机制的影响
Nucleic Acids Res. 2007;35(13):4409-19. doi: 10.1093/nar/gkm451. Epub 2007 Jun 18.
8
A systems approach to measuring the binding energy landscapes of transcription factors.一种用于测量转录因子结合能景观的系统方法。
Science. 2007 Jan 12;315(5809):233-7. doi: 10.1126/science.1131007.
9
A genomic code for nucleosome positioning.一种核小体定位的基因组编码。
Nature. 2006 Aug 17;442(7104):772-8. doi: 10.1038/nature04979. Epub 2006 Jul 19.
10
Stubb: a program for discovery and analysis of cis-regulatory modules.Stubb:一个用于发现和分析顺式调控模块的程序。
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W555-9. doi: 10.1093/nar/gkl224.