Suppr超能文献

在DNA序列排名列表中发现基序。

Discovering motifs in ranked lists of DNA sequences.

作者信息

Eden Eran, Lipson Doron, Yogev Sivan, Yakhini Zohar

机构信息

Computer Science Department, Technion, Haifa, Israel.

出版信息

PLoS Comput Biol. 2007 Mar 23;3(3):e39. doi: 10.1371/journal.pcbi.0030039.

Abstract

Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP-chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP-chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP-chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP-chip to CpG methylation data. DRIM is publicly available at http://bioinfo.cs.technion.ac.il/drim.

摘要

与背景集相比,发现目标集中富集的序列元件的计算方法是分子生物学研究的基础。一个例子是从ChIP芯片(微阵列上的染色质免疫沉淀)测量中推断出的转录因子结合基序的发现。序列基序发现中的几个主要挑战仍需考虑:(i)需要一种有原则的方法将数据划分为目标集和背景集;(ii)缺乏用于测量基序富集的严格模型和精确的p值;(iii)需要一个适当的框架来考虑基序的多重性;(iv)在许多现有方法中,即使应用于随机生成的数据,也倾向于报告可能显著的基序。在本文中,我们提出了一个统计框架,用于在排序列表中发现富集的序列元件,该框架解决了这四个问题。我们展示了该框架在一个名为DRIM(排名不平衡基序发现)的软件应用程序中的实现,该程序可识别排名DNA序列列表中的序列基序。我们将DRIM应用于ChIP芯片和CpG甲基化数据,并获得了以下结果。(i)在酵母ChIP芯片数据中鉴定出50个新的推定转录因子(TF)结合位点。对其中一些结合位点的生物学功能进行了进一步研究,以获得对酵母转录调控网络的新见解。例如,我们的发现有助于阐明TF ARO80的网络。另一个发现涉及TF与含有CA重复序列的序列的系统性结合增强。(ii)在人类癌症CpG甲基化数据中发现新的基序。值得注意的是,这些基序中的大多数与促进组蛋白甲基化的多梳复合体结合的DNA序列元件相似。因此,我们的发现支持了一种组蛋白甲基化和CpG甲基化在机制上相关的模型。总体而言,我们证明了DRIM软件工具中体现的统计框架在识别从表达和ChIP芯片到CpG甲基化数据等各种应用中的调控序列元件方面非常有效。DRIM可在http://bioinfo.cs.technion.ac.il/drim上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e44c/1847989/794f8b071167/pcbi.0030039.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验