Suppr超能文献

一种通用的成对相互作用模型能够准确描述体内转录因子结合位点。

A general pairwise interaction model provides an accurate description of in vivo transcription factor binding sites.

作者信息

Santolini Marc, Mora Thierry, Hakim Vincent

机构信息

Laboratoire de Physique Statistique, CNRS, Université P. et M. Curie, Université D. Diderot, École Normale Supérieure, Paris, France.

出版信息

PLoS One. 2014 Jun 13;9(6):e99015. doi: 10.1371/journal.pone.0099015. eCollection 2014.

Abstract

The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair contributes independently to the transcription factor (TF) binding. However, this description ignores correlations between nucleotides at different positions, and is generally inaccurate: analysing fly and mouse in vivo ChIPseq data, we show that in most cases the PWM model fails to reproduce the observed statistics of TFBSs. To overcome this issue, we introduce the pairwise interaction model (PIM), a generalization of the PWM model. The model is based on the principle of maximum entropy and explicitly describes pairwise correlations between nucleotides at different positions, while being otherwise as unconstrained as possible. It is mathematically equivalent to considering a TF-DNA binding energy that depends additively on each nucleotide identity at all positions in the TFBS, like the PWM model, but also additively on pairs of nucleotides. We find that the PIM significantly improves over the PWM model, and even provides an optimal description of TFBS statistics within statistical noise. The PIM generalizes previous approaches to interdependent positions: it accounts for co-variation of two or more base pairs, and predicts secondary motifs, while outperforming multiple-motif models consisting of mixtures of PWMs. We analyse the structure of pairwise interactions between nucleotides, and find that they are sparse and dominantly located between consecutive base pairs in the flanking region of TFBS. Nonetheless, interactions between pairs of non-consecutive nucleotides are found to play a significant role in the obtained accurate description of TFBS statistics. The PIM is computationally tractable, and provides a general framework that should be useful for describing and predicting TFBSs beyond PWMs.

摘要

识别基因组DNA上的转录因子结合位点(TFBS)对于理解和预测基因网络中的调控元件至关重要。TFBS基序通常由位置权重矩阵(PWM)描述,其中每个DNA碱基对独立地对转录因子(TF)结合做出贡献。然而,这种描述忽略了不同位置核苷酸之间的相关性,并且通常不准确:通过分析果蝇和小鼠的体内ChIPseq数据,我们表明在大多数情况下,PWM模型无法重现观察到的TFBS统计数据。为了克服这个问题,我们引入了成对相互作用模型(PIM),它是PWM模型的推广。该模型基于最大熵原理,明确描述了不同位置核苷酸之间的成对相关性,同时尽可能不受其他约束。它在数学上等同于考虑一种TF-DNA结合能,该结合能像PWM模型一样,对TFBS中所有位置的每个核苷酸身份进行加法依赖,但也对核苷酸对进行加法依赖。我们发现PIM比PWM模型有显著改进,甚至在统计噪声范围内提供了TFBS统计数据的最优描述。PIM将先前针对相互依赖位置的方法进行了推广:它考虑了两个或更多碱基对的共变,并预测二级基序,同时优于由PWM混合物组成的多基序模型。我们分析了核苷酸之间成对相互作用的结构,发现它们是稀疏的,并且主要位于TFBS侧翼区域的连续碱基对之间。尽管如此,发现非连续核苷酸对之间的相互作用在获得的TFBS统计数据的准确描述中起着重要作用。PIM在计算上易于处理,并提供了一个通用框架,该框架对于描述和预测超越PWM的TFBS应该是有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e7/4057186/275c2ee5e67c/pone.0099015.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验