Suppr超能文献

OHMM:一种隐马尔可夫模型,可准确预测具有自重叠结合基序的转录因子的占有率。

OHMM: a Hidden Markov Model accurately predicting the occupancy of a transcription factor with a self-overlapping binding motif.

作者信息

Drawid Amar, Gupta Nupur, Nagaraj Vijayalakshmi H, Gélinas Céline, Sengupta Anirvan M

机构信息

BioMAPS Institute for Quantitative Biology, Rutgers University, Piscataway, NJ, USA.

出版信息

BMC Bioinformatics. 2009 Jul 7;10:208. doi: 10.1186/1471-2105-10-208.

Abstract

BACKGROUND

DNA sequence binding motifs for several important transcription factors happen to be self-overlapping. Many of the current regulatory site identification methods do not explicitly take into account the overlapping sites. Moreover, most methods use arbitrary thresholds and fail to provide a biophysical interpretation of statistical quantities. In addition, commonly used approaches do not include the location of a site with respect to the transcription start site (TSS) in an integrated probabilistic framework while identifying sites. Ignoring these features can lead to inaccurate predictions as well as incorrect design and interpretation of experimental results.

RESULTS

We have developed a tool based on a Hidden Markov Model (HMM) that identifies binding location of transcription factors with preference for self-overlapping DNA motifs by combining the effects of their alternative binding modes. Interpreting HMM parameters as biophysical quantities, this method uses the occupancy probability of a transcription factor on a DNA sequence as the discriminant function, earning the algorithm the name OHMM: Occupancy via Hidden Markov Model. OHMM learns the classification threshold by training emission probabilities using unaligned sequences containing known sites and estimating transition probabilities to reflect site density in all promoters in a genome. While identifying sites, it adjusts parameters to model site density changing with the distance from the transcription start site. Moreover, it provides guidance for designing padding sequences in gel shift experiments. In the context of binding sites to transcription factor NF-kappaB, we find that the occupancy probability predicted by OHMM correlates well with the binding affinity in gel shift experiments. High evolutionary conservation scores and enrichment in experimentally verified regulated genes suggest that NF-kappaB binding sites predicted by our method are likely to be functional.

CONCLUSION

Our method deals specifically with identifying locations with multiple overlapping binding sites by computing the local occupancy of the transcription factor. Moreover, considering OHMM as a biophysical model allows us to learn the classification threshold in a principled manner. Another feature of OHMM is that we allow transition probabilities to change with location relative to the TSS. OHMM could be used to predict physical occupancy, and provides guidance for proper design of gel-shift experiments. Based upon our predictions, new insights into NF-kappaB function and regulation and possible new biological roles of NF-kappaB were uncovered.

摘要

背景

几种重要转录因子的DNA序列结合基序恰好是自我重叠的。当前许多调控位点识别方法并未明确考虑这些重叠位点。此外,大多数方法使用任意阈值,且未能对统计量给出生物物理学解释。另外,常用方法在识别位点时,未在综合概率框架中纳入位点相对于转录起始位点(TSS)的位置。忽略这些特征会导致预测不准确,以及实验结果的设计和解释错误。

结果

我们开发了一种基于隐马尔可夫模型(HMM)的工具,该工具通过结合转录因子不同结合模式的影响,识别对自我重叠DNA基序有偏好的转录因子结合位置。将HMM参数解释为生物物理量,此方法使用转录因子在DNA序列上的占据概率作为判别函数,该算法因此被命名为OHMM:通过隐马尔可夫模型实现的占据。OHMM通过使用包含已知位点的未比对序列训练发射概率并估计转移概率以反映基因组中所有启动子的位点密度来学习分类阈值。在识别位点时,它会调整参数以模拟位点密度随距转录起始位点距离的变化。此外,它为凝胶迁移实验中填充序列的设计提供指导。在转录因子NF-κB结合位点的背景下,我们发现OHMM预测的占据概率与凝胶迁移实验中的结合亲和力密切相关。高进化保守分数以及在实验验证的调控基因中的富集表明,我们方法预测的NF-κB结合位点可能具有功能。

结论

我们的方法通过计算转录因子的局部占据情况,专门处理具有多个重叠结合位点的位置识别。此外,将OHMM视为生物物理模型使我们能够以有原则的方式学习分类阈值。OHMM的另一个特点是我们允许转移概率随相对于TSS的位置而变化。OHMM可用于预测物理占据情况,并为凝胶迁移实验的合理设计提供指导。基于我们的预测,揭示了关于NF-κB功能和调控的新见解以及NF-κB可能的新生物学作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6e0/2718928/d119e40ec788/1471-2105-10-208-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验