蛋白质 - DNA 结合特异性概率模型的内在局限性。

Inherent limitations of probabilistic models for protein-DNA binding specificity.

作者信息

Ruan Shuxiang, Stormo Gary D

机构信息

Department of Genetics and The Edison Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri, United States of America.

出版信息

PLoS Comput Biol. 2017 Jul 7;13(7):e1005638. doi: 10.1371/journal.pcbi.1005638. eCollection 2017 Jul.

DOI:10.1371/journal.pcbi.1005638

PMID:28686588

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5521849/

Abstract

The specificities of transcription factors are most commonly represented with probabilistic models. These models provide a probability for each base occurring at each position within the binding site and the positions are assumed to contribute independently. The model is simple and intuitive and is the basis for many motif discovery algorithms. However, the model also has inherent limitations that prevent it from accurately representing true binding probabilities, especially for the highest affinity sites under conditions of high protein concentration. The limitations are not due to the assumption of independence between positions but rather are caused by the non-linear relationship between binding affinity and binding probability and the fact that independent normalization at each position skews the site probabilities. Generally probabilistic models are reasonably good approximations, but new high-throughput methods allow for biophysical models with increased accuracy that should be used whenever possible.

摘要

转录因子的特异性最常由概率模型表示。这些模型为结合位点内每个位置出现的每个碱基提供一个概率，并且假定这些位置是独立起作用的。该模型简单直观，是许多基序发现算法的基础。然而，该模型也有其固有的局限性，使其无法准确表示真实的结合概率，特别是在高蛋白浓度条件下对于最高亲和力位点的情况。这些局限性并非源于位置之间独立性的假设，而是由结合亲和力与结合概率之间的非线性关系以及每个位置的独立归一化使位点概率产生偏差这一事实导致的。一般来说，概率模型是相当不错的近似，但新的高通量方法允许使用准确性更高的生物物理模型，应尽可能使用这些模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd01/5521849/c67e570eb525/pcbi.1005638.g001.jpg

相似文献

Inherent limitations of probabilistic models for protein-DNA binding specificity.

PLoS Comput Biol. 2017 Jul 7;13(7):e1005638. doi: 10.1371/journal.pcbi.1005638. eCollection 2017 Jul.

Inferring binding energies from selected binding sites.

PLoS Comput Biol. 2009 Dec;5(12):e1000590. doi: 10.1371/journal.pcbi.1000590. Epub 2009 Dec 4.

OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif.

BMC Bioinformatics. 2009 Jul 7;10:208. doi: 10.1186/1471-2105-10-208.

Probabilistic Inference on Multiple Normalized Signal Profiles from Next Generation Sequencing: Transcription Factor Binding Sites.

IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1416-28. doi: 10.1109/TCBB.2015.2424421.

Structure-based ab initio prediction of transcription factor-binding sites.

Methods Mol Biol. 2009;541:23-41. doi: 10.1007/978-1-59745-243-4_2.

Position dependencies in transcription factor binding sites.

Bioinformatics. 2007 Apr 15;23(8):933-41. doi: 10.1093/bioinformatics/btm055. Epub 2007 Feb 18.

Automatic extraction of position specific cooccurrence of transcription factor bindings on promoters.

Pac Symp Biocomput. 1998:252-63.

RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors.

Bioinformatics. 2007 Jul 1;23(13):i72-9. doi: 10.1093/bioinformatics/btm224.

Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix.

PLoS One. 2010 Mar 22;5(3):e9722. doi: 10.1371/journal.pone.0009722.

Modeling within-motif dependence for transcription factor binding site predictions.

Bioinformatics. 2004 Apr 12;20(6):909-16. doi: 10.1093/bioinformatics/bth006. Epub 2004 Jan 29.

引用本文的文献

TFscope: systematic analysis of the sequence features involved in the binding preferences of transcription factors.

Genome Biol. 2024 Jul 10;25(1):187. doi: 10.1186/s13059-024-03321-8.

Eukaryotic gene regulation at equilibrium, or non?

Curr Opin Syst Biol. 2022 Sep;31. doi: 10.1016/j.coisb.2022.100435. Epub 2022 Oct 20.

Modeling binding specificities of transcription factor pairs with random forests.

BMC Bioinformatics. 2022 Jun 3;23(1):212. doi: 10.1186/s12859-022-04734-7.

Diffusion of DNA-Binding Species in the Nucleus: A Transient Anomalous Subdiffusion Model.

Biophys J. 2020 May 5;118(9):2151-2167. doi: 10.1016/j.bpj.2020.03.015. Epub 2020 Apr 4.

Functional effects of variation in transcription factor binding highlight long-range gene regulation by epromoters.

Nucleic Acids Res. 2020 Apr 6;48(6):2866-2879. doi: 10.1093/nar/gkaa123.

Sharing DNA-binding information across structurally similar proteins enables accurate specificity determination.

Nucleic Acids Res. 2020 Jan 24;48(2):e9. doi: 10.1093/nar/gkz1087.

ChIPulate: A comprehensive ChIP-seq simulation pipeline.

PLoS Comput Biol. 2019 Mar 21;15(3):e1006921. doi: 10.1371/journal.pcbi.1006921. eCollection 2019 Mar.

Degenerate Pax2 and Senseless binding motifs improve detection of low-affinity sites required for enhancer specificity.

PLoS Genet. 2018 Apr 4;14(4):e1007289. doi: 10.1371/journal.pgen.1007289. eCollection 2018 Apr.

Comparison of discriminative motif optimization using matrix and DNA shape-based models.

BMC Bioinformatics. 2018 Mar 6;19(1):86. doi: 10.1186/s12859-018-2104-7.

SelexGLM differentiates androgen and glucocorticoid receptor DNA-binding preference over an extended binding site.

Genome Res. 2018 Jan;28(1):111-121. doi: 10.1101/gr.222844.117. Epub 2017 Dec 1.

本文引用的文献

Quantitative specificity of STAT1 and several variants.

Nucleic Acids Res. 2017 Aug 21;45(14):8199-8207. doi: 10.1093/nar/gkx393.

BEESEM: estimation of binding energy models using HT-SELEX data.

Bioinformatics. 2017 Aug 1;33(15):2288-2295. doi: 10.1093/bioinformatics/btx191.

Transcription factor family-specific DNA shape readout revealed by quantitative specificity models.

Mol Syst Biol. 2017 Feb 6;13(2):910. doi: 10.15252/msb.20167238.

SMiLE-seq identifies binding motifs of single and dimeric transcription factors.

Nat Methods. 2017 Mar;14(3):316-322. doi: 10.1038/nmeth.4143. Epub 2017 Jan 16.

DNA Shape Features Improve Transcription Factor Binding Site Predictions In Vivo.

Cell Syst. 2016 Sep 28;3(3):278-286.e4. doi: 10.1016/j.cels.2016.07.001. Epub 2016 Aug 18.

Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE.

Elife. 2015 Dec 23;4:e06397. doi: 10.7554/eLife.06397.

DNA-dependent formation of transcription factor pairs alters their binding specificity.

Nature. 2015 Nov 19;527(7578):384-8. doi: 10.1038/nature15518. Epub 2015 Nov 9.

A Biophysical Approach to Predicting Protein-DNA Binding Energetics.

Genetics. 2015 Aug;200(4):1349-61. doi: 10.1534/genetics.115.178384. Epub 2015 Jun 16.

The MEME Suite.

Nucleic Acids Res. 2015 Jul 1;43(W1):W39-49. doi: 10.1093/nar/gkv416. Epub 2015 May 7.

Recent progress in understanding transcription factor binding specificity.

Brief Funct Genomics. 2015 Jan;14(1):1-2. doi: 10.1093/bfgp/elu050.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

蛋白质 - DNA 结合特异性概率模型的内在局限性。

Inherent limitations of probabilistic models for protein-DNA binding specificity.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献