一种通用的成对相互作用模型能够准确描述体内转录因子结合位点。

A general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites.

作者信息

Santolini Marc, Mora Thierry, Hakim Vincent

机构信息

Laboratoire de Physique Statistique, CNRS, Université P. et M. Curie, Université D. Diderot, École Normale Supérieure, Paris, France.

出版信息

PLoS One. 2014 Jun 13;9(6):e99015. doi: 10.1371/journal.pone.0099015. eCollection 2014.

DOI:10.1371/journal.pone.0099015

PMID:24926895

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4057186/

Abstract

The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair contributes independently to the transcription factor (TF) binding. However, this description ignores correlations between nucleotides at different positions, and is generally inaccurate: analysing fly and mouse in vivo ChIPseq data, we show that in most cases the PWM model fails to reproduce the observed statistics of TFBSs. To overcome this issue, we introduce the pairwise interaction model (PIM), a generalization of the PWM model. The model is based on the principle of maximum entropy and explicitly describes pairwise correlations between nucleotides at different positions, while being otherwise as unconstrained as possible. It is mathematically equivalent to considering a TF-DNA binding energy that depends additively on each nucleotide identity at all positions in the TFBS, like the PWM model, but also additively on pairs of nucleotides. We find that the PIM significantly improves over the PWM model, and even provides an optimal description of TFBS statistics within statistical noise. The PIM generalizes previous approaches to interdependent positions: it accounts for co-variation of two or more base pairs, and predicts secondary motifs, while outperforming multiple-motif models consisting of mixtures of PWMs. We analyse the structure of pairwise interactions between nucleotides, and find that they are sparse and dominantly located between consecutive base pairs in the flanking region of TFBS. Nonetheless, interactions between pairs of non-consecutive nucleotides are found to play a significant role in the obtained accurate description of TFBS statistics. The PIM is computationally tractable, and provides a general framework that should be useful for describing and predicting TFBSs beyond PWMs.

摘要

识别基因组DNA上的转录因子结合位点（TFBS）对于理解和预测基因网络中的调控元件至关重要。TFBS基序通常由位置权重矩阵（PWM）描述，其中每个DNA碱基对独立地对转录因子（TF）结合做出贡献。然而，这种描述忽略了不同位置核苷酸之间的相关性，并且通常不准确：通过分析果蝇和小鼠的体内ChIPseq数据，我们表明在大多数情况下，PWM模型无法重现观察到的TFBS统计数据。为了克服这个问题，我们引入了成对相互作用模型（PIM），它是PWM模型的推广。该模型基于最大熵原理，明确描述了不同位置核苷酸之间的成对相关性，同时尽可能不受其他约束。它在数学上等同于考虑一种TF-DNA结合能，该结合能像PWM模型一样，对TFBS中所有位置的每个核苷酸身份进行加法依赖，但也对核苷酸对进行加法依赖。我们发现PIM比PWM模型有显著改进，甚至在统计噪声范围内提供了TFBS统计数据的最优描述。PIM将先前针对相互依赖位置的方法进行了推广：它考虑了两个或更多碱基对的共变，并预测二级基序，同时优于由PWM混合物组成的多基序模型。我们分析了核苷酸之间成对相互作用的结构，发现它们是稀疏的，并且主要位于TFBS侧翼区域的连续碱基对之间。尽管如此，发现非连续核苷酸对之间的相互作用在获得的TFBS统计数据的准确描述中起着重要作用。PIM在计算上易于处理，并提供了一个通用框架，该框架对于描述和预测超越PWM的TFBS应该是有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e7/4057186/275c2ee5e67c/pone.0099015.g001.jpg

相似文献

A general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites.一种通用的成对相互作用模型能够准确描述体内转录因子结合位点。

PLoS One. 2014 Jun 13;9(6):e99015. doi: 10.1371/journal.pone.0099015. eCollection 2014.

Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix.二核苷酸权重矩阵用于预测转录因子结合位点：位置权重矩阵的推广。

PLoS One. 2010 Mar 22;5(3):e9722. doi: 10.1371/journal.pone.0009722.

A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.一种基于DNA形状的调控评分提高了基于位置权重矩阵对转录因子结合位点的识别。

Bioinformatics. 2015 Nov 1;31(21):3445-50. doi: 10.1093/bioinformatics/btv391. Epub 2015 Jun 30.

Transcription Factor Information System (TFIS): A Tool for Detection of Transcription Factor Binding Sites.转录因子信息系统（TFIS）：一种用于检测转录因子结合位点的工具。

Interdiscip Sci. 2017 Sep;9(3):378-391. doi: 10.1007/s12539-016-0168-5. Epub 2016 Apr 6.

Tree-based position weight matrix approach to model transcription factor binding site profiles.基于树的位置权重矩阵方法来模拟转录因子结合位点图谱。

PLoS One. 2011;6(9):e24210. doi: 10.1371/journal.pone.0024210. Epub 2011 Sep 2.

TEMPLE: analysing population genetic variation at transcription factor binding sites.坦普尔：分析转录因子结合位点处的群体遗传变异。

Mol Ecol Resour. 2016 Nov;16(6):1428-1434. doi: 10.1111/1755-0998.12535. Epub 2016 May 9.

Optimized position weight matrices in prediction of novel putative binding sites for transcription factors in the Drosophila melanogaster genome.优化位置权重矩阵以预测果蝇基因组中转录因子的新型潜在结合位点。

PLoS One. 2013 Aug 6;8(8):e68712. doi: 10.1371/journal.pone.0068712. Print 2013.

A Bayesian search for transcriptional motifs.贝叶斯搜索转录基序。

PLoS One. 2010 Nov 18;5(11):e13897. doi: 10.1371/journal.pone.0013897.

An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs.一种基于直觉的方法，用于对 DNA 序列进行评分，以对抗转录因子结合位点基序。

BMC Bioinformatics. 2010 Nov 8;11:551. doi: 10.1186/1471-2105-11-551.

Molecular and structural considerations of TF-DNA binding for the generation of biologically meaningful and accurate phylogenetic footprinting analysis: the LysR-type transcriptional regulator family as a study model.用于生成具有生物学意义和准确的系统发育足迹分析的TF-DNA结合的分子和结构考量：以LysR型转录调节因子家族作为研究模型

BMC Genomics. 2016 Aug 27;17(1):686. doi: 10.1186/s12864-016-3025-3.

引用本文的文献

Collective behavior and self-organization in neural rosette morphogenesis.神经玫瑰花结形态发生中的集体行为与自组织

Front Cell Dev Biol. 2023 Aug 10;11:1134091. doi: 10.3389/fcell.2023.1134091. eCollection 2023.

Maximum Entropy Technique and Regularization Functional for Determining the Pharmacokinetic Parameters in DCE-MRI.最大熵技术和正则化函数在 DCE-MRI 中确定药代动力学参数。

J Digit Imaging. 2022 Oct;35(5):1176-1188. doi: 10.1007/s10278-022-00646-3. Epub 2022 May 26.

Modified Maximum Entropy Method and Estimating the AIF via DCE-MRI Data Analysis.改进的最大熵方法及通过动态对比增强磁共振成像数据分析估计动脉输入函数

Entropy (Basel). 2022 Jan 20;24(2):155. doi: 10.3390/e24020155.

TOLOMEO, a Novel Machine Learning Algorithm to Measure Information and Order in Correlated Networks and Predict Their State.TOLOMEO，一种用于测量相关网络中的信息和秩序并预测其状态的新型机器学习算法。

Entropy (Basel). 2021 Aug 31;23(9):1138. doi: 10.3390/e23091138.

Analyzing a putative enhancer of optic disc morphology.分析视盘形态的假定增强子。

BMC Genet. 2020 Oct 22;21(Suppl 1):73. doi: 10.1186/s12863-020-00873-z.

An introduction to the maximum entropy approach and its application to inference problems in biology.最大熵方法及其在生物学推理问题中的应用简介。

Heliyon. 2018 Apr 13;4(4):e00596. doi: 10.1016/j.heliyon.2018.e00596. eCollection 2018 Apr.

The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies.折叠k谱核：一种利用有间隙核苷酸依赖性检测转录因子结合位点的机器学习方法。

PLoS One. 2017 Oct 5;12(10):e0185570. doi: 10.1371/journal.pone.0185570. eCollection 2017.

Parametric bootstrapping for biological sequence motifs.生物序列基序的参数自举法

BMC Bioinformatics. 2016 Oct 6;17(1):406. doi: 10.1186/s12859-016-1246-8.

MyoD reprogramming requires Six1 and Six4 homeoproteins: genome-wide cis-regulatory module analysis.MyoD重编程需要Six1和Six4同源蛋白：全基因组顺式调控模块分析

Nucleic Acids Res. 2016 Oct 14;44(18):8621-8640. doi: 10.1093/nar/gkw512. Epub 2016 Jun 14.

Quantitative modeling of gene expression using DNA shape features of binding sites.利用结合位点的DNA形状特征对基因表达进行定量建模。

Nucleic Acids Res. 2016 Jul 27;44(13):e120. doi: 10.1093/nar/gkw446. Epub 2016 Jun 1.

本文引用的文献

Dynamical maximum entropy approach to flocking.用于群聚的动态最大熵方法。

Phys Rev E Stat Nonlin Soft Matter Phys. 2014 Apr;89(4):042707. doi: 10.1103/PhysRevE.89.042707. Epub 2014 Apr 16.

Social interactions dominate speed control in poising natural flocks near criticality.社会互动主导着临近临界点的自然群体中平衡的速度控制。

Proc Natl Acad Sci U S A. 2014 May 20;111(20):7212-7. doi: 10.1073/pnas.1324045111. Epub 2014 May 1.

Searching for collective behavior in a large network of sensory neurons.在大型感觉神经元网络中寻找集体行为。

PLoS Comput Biol. 2014 Jan;10(1):e1003408. doi: 10.1371/journal.pcbi.1003408. Epub 2014 Jan 2.

The next generation of transcription factor binding site prediction.下一代转录因子结合位点预测。

PLoS Comput Biol. 2013;9(9):e1003214. doi: 10.1371/journal.pcbi.1003214. Epub 2013 Sep 5.

From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.从主成分分析到蛋白质共进化的直接耦合分析：结构预测需要低特征值模式。

PLoS Comput Biol. 2013;9(8):e1003176. doi: 10.1371/journal.pcbi.1003176. Epub 2013 Aug 22.

Elements of coevolution in biological sequences.生物序列中的共同进化元素。

Phys Rev Lett. 2013 Apr 26;110(17):178102. doi: 10.1103/PhysRevLett.110.178102. Epub 2013 Apr 23.

DNA-binding specificities of human transcription factors.人类转录因子的 DNA 结合特异性。

Cell. 2013 Jan 17;152(1-2):327-39. doi: 10.1016/j.cell.2012.12.009.

What does our genome encode?我们的基因组编码什么？

Genome Res. 2012 Sep;22(9):1602-11. doi: 10.1101/gr.146506.112.

An integrated encyclopedia of DNA elements in the human genome.人类基因组中 DNA 元件的综合百科全书。

Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.

Transcription factors: from enhancer binding to developmental control.转录因子：从增强子结合到发育控制。

Nat Rev Genet. 2012 Sep;13(9):613-26. doi: 10.1038/nrg3207. Epub 2012 Aug 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种通用的成对相互作用模型能够准确描述体内转录因子结合位点。

A general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献