使用二核苷酸权重张量在转录因子结合位点预测中自动纳入成对依赖性。

Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors.

作者信息

Omidi Saeed, Zavolan Mihaela, Pachkov Mikhail, Breda Jeremie, Berger Severin, van Nimwegen Erik

机构信息

Biozentrum, University of Basel, Basel, Switzerland.

Swiss Institute of Bioinformatics, Basel, Switzerland.

出版信息

PLoS Comput Biol. 2017 Jul 28;13(7):e1005176. doi: 10.1371/journal.pcbi.1005176. eCollection 2017 Jul.

DOI:10.1371/journal.pcbi.1005176

PMID:28753602

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5550003/

Abstract

Gene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to the overall binding affinity, evidence has been accumulating that there can be significant dependencies between positions. Unfortunately, methodological challenges have so far hindered the development of a practical and generally-accepted extension of the PSWM model. On the one hand, simple models that only consider dependencies between nearest-neighbor positions are easy to use in practice, but fail to account for the distal dependencies that are observed in the data. On the other hand, models that allow for arbitrary dependencies are prone to overfitting, requiring regularization schemes that are difficult to use in practice for non-experts. Here we present a new regulatory motif model, called dinucleotide weight tensor (DWT), that incorporates arbitrary pairwise dependencies between positions in binding sites, rigorously from first principles, and free from tunable parameters. We demonstrate the power of the method on a large set of ChIP-seq data-sets, showing that DWTs outperform both PSWMs and motif models that only incorporate nearest-neighbor dependencies. We also demonstrate that DWTs outperform two previously proposed methods. Finally, we show that DWTs inferred from ChIP-seq data also outperform PSWMs on HT-SELEX data for the same TF, suggesting that DWTs capture inherent biophysical properties of the interactions between the DNA binding domains of TFs and their binding sites. We make a suite of DWT tools available at dwt.unibas.ch, that allow users to automatically perform 'motif finding', i.e. the inference of DWT motifs from a set of sequences, binding site prediction with DWTs, and visualization of DWT 'dilogo' motifs.

摘要

基因调控网络最终由转录因子（TFs）与短DNA片段的序列特异性结合所编码。虽然通常用位置特异性权重矩阵（PSWM）来表示转录因子的结合特异性，该矩阵假设位点内的每个位置对整体结合亲和力有独立贡献，但越来越多的证据表明，位置之间可能存在显著的依赖性。不幸的是，方法上的挑战迄今为止阻碍了PSWM模型实用且被广泛接受的扩展的发展。一方面，仅考虑最近邻位置之间依赖性的简单模型在实践中易于使用，但无法解释数据中观察到的远端依赖性。另一方面，允许任意依赖性的模型容易过度拟合，需要非专业人员在实践中难以使用的正则化方案。在这里，我们提出了一种新的调控基序模型，称为二核苷酸权重张量（DWT），它从第一原理严格地纳入了结合位点中位置之间的任意成对依赖性，并且没有可调参数。我们在大量的ChIP-seq数据集上展示了该方法的强大功能，表明DWT优于PSWM和仅纳入最近邻依赖性的基序模型。我们还证明DWT优于两种先前提出的方法。最后，我们表明从ChIP-seq数据推断出的DWT在相同转录因子的HT-SELEX数据上也优于PSWM，这表明DWT捕获了转录因子的DNA结合结构域与其结合位点之间相互作用的固有生物物理特性。我们在dwt.unibas.ch上提供了一套DWT工具，允许用户自动执行“基序查找”，即从一组序列中推断DWT基序、使用DWT进行结合位点预测以及可视化DWT“双核苷酸”基序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec63/5550003/ad591edb81db/pcbi.1005176.g001.jpg

相似文献

Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors.使用二核苷酸权重张量在转录因子结合位点预测中自动纳入成对依赖性。

PLoS Comput Biol. 2017 Jul 28;13(7):e1005176. doi: 10.1371/journal.pcbi.1005176. eCollection 2017 Jul.

Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data.从ChIP-seq数据推断DNA结合位点的基序内依赖性。

BMC Bioinformatics. 2015 Nov 9;16:375. doi: 10.1186/s12859-015-0797-4.

Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences.在预测核苷酸序列中的基序方面，贝叶斯马尔可夫模型始终优于位置权重矩阵。

Nucleic Acids Res. 2016 Jul 27;44(13):6055-69. doi: 10.1093/nar/gkw521. Epub 2016 Jun 9.

PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.PhyloGibbs：一种整合了系统发育的吉布斯采样基序查找器。

PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9.

Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data.基于染色质免疫沉淀测序（ChIP-seq）数据优化选择PWM基序数据库和序列扫描方法。

BMC Bioinformatics. 2015 May 1;16:140. doi: 10.1186/s12859-015-0573-5.

The next generation of transcription factor binding site prediction.下一代转录因子结合位点预测。

PLoS Comput Biol. 2013;9(9):e1003214. doi: 10.1371/journal.pcbi.1003214. Epub 2013 Sep 5.

Molecular and structural considerations of TF-DNA binding for the generation of biologically meaningful and accurate phylogenetic footprinting analysis: the LysR-type transcriptional regulator family as a study model.用于生成具有生物学意义和准确的系统发育足迹分析的TF-DNA结合的分子和结构考量：以LysR型转录调节因子家族作为研究模型

BMC Genomics. 2016 Aug 27;17(1):686. doi: 10.1186/s12864-016-3025-3.

A widespread role of the motif environment in transcription factor binding across diverse protein families.模体环境在不同蛋白质家族转录因子结合中的广泛作用。

Genome Res. 2015 Sep;25(9):1268-80. doi: 10.1101/gr.184671.114. Epub 2015 Jul 9.

Varying levels of complexity in transcription factor binding motifs.转录因子结合基序的复杂程度不同。

Nucleic Acids Res. 2015 Oct 15;43(18):e119. doi: 10.1093/nar/gkv577. Epub 2015 Jun 26.

Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.序列基序、染色质状态和DNA结构特征对酵母转录因子结合预测模型的贡献

PLoS Comput Biol. 2015 Aug 20;11(8):e1004418. doi: 10.1371/journal.pcbi.1004418. eCollection 2015 Aug.

引用本文的文献

Bacterial Metallostasis: Metal Sensing, Metalloproteome Remodeling, and Metal Trafficking.细菌金属稳态：金属感应、金属蛋白质组重塑及金属转运

Chem Rev. 2024 Dec 25;124(24):13574-13659. doi: 10.1021/acs.chemrev.4c00264. Epub 2024 Dec 10.

The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes.人类和拟南芥基因组中转录因子结合位点预测工具的评估

BMC Bioinformatics. 2024 Dec 2;25(1):371. doi: 10.1186/s12859-024-05995-0.

Position-specific evolution in transcription factor binding sites, and a fast likelihood calculation for the F81 model.转录因子结合位点的位置特异性进化以及F81模型的快速似然计算。

R Soc Open Sci. 2024 Jan 24;11(1):231088. doi: 10.1098/rsos.231088. eCollection 2024 Jan.

MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs.MODER2：一阶马尔可夫建模和单体及二聚体结合基序的发现。

Bioinformatics. 2020 May 1;36(9):2690-2696. doi: 10.1093/bioinformatics/btaa045.

Disentangling transcription factor binding site complexity.解析转录因子结合位点的复杂性。

Nucleic Acids Res. 2018 Nov 16;46(20):e121. doi: 10.1093/nar/gky683.

JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework.JASPAR 2018：转录因子结合谱的开放获取数据库及其网络框架的更新。

Nucleic Acids Res. 2018 Jan 4;46(D1):D260-D266. doi: 10.1093/nar/gkx1126.

本文引用的文献

Nucleic Acids Res. 2016 Jul 27;44(13):6055-69. doi: 10.1093/nar/gkw521. Epub 2016 Jun 9.

The next generation of transcription factor binding site prediction.下一代转录因子结合位点预测。

PLoS Comput Biol. 2013;9(9):e1003214. doi: 10.1371/journal.pcbi.1003214. Epub 2013 Sep 5.

DNA-binding specificities of human transcription factors.人类转录因子的 DNA 结合特异性。

Cell. 2013 Jan 17;152(1-2):327-39. doi: 10.1016/j.cell.2012.12.009.

An integrated encyclopedia of DNA elements in the human genome.人类基因组中 DNA 元件的综合百科全书。

Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.

Improved models for transcription factor binding site identification using nonindependent interactions.利用非独立相互作用改进转录因子结合位点识别模型。

Genetics. 2012 Jul;191(3):781-90. doi: 10.1534/genetics.112.138685. Epub 2012 Apr 13.

MotEvo: integrated Bayesian probabilistic methods for inferring regulatory sites and motifs on multiple alignments of DNA sequences.MotEvo：一种用于在 DNA 序列多重比对上推断调控位点和基序的集成贝叶斯概率方法。

Bioinformatics. 2012 Feb 15;28(4):487-94. doi: 10.1093/bioinformatics/btr695.

High-throughput SELEX determination of DNA sequences bound by transcription factors in vitro.体外转录因子结合的DNA序列的高通量SELEX测定

Methods Mol Biol. 2012;786:51-63. doi: 10.1007/978-1-61779-292-2_3.

Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument.在高通量测序仪器上直接测量 DNA 亲和力图谱。

Nat Biotechnol. 2011 Jun 26;29(7):659-64. doi: 10.1038/nbt.1882.

Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities.高通量并行 SELEX 技术用于鉴定人转录因子结合特异性。

Genome Res. 2010 Jun;20(6):861-73. doi: 10.1101/gr.100552.109. Epub 2010 Apr 8.

Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix.二核苷酸权重矩阵用于预测转录因子结合位点：位置权重矩阵的推广。

PLoS One. 2010 Mar 22;5(3):e9722. doi: 10.1371/journal.pone.0009722.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用二核苷酸权重张量在转录因子结合位点预测中自动纳入成对依赖性。

Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献