Suppr超能文献

从定位数据中鉴别性发现转录因子结合位点

Discriminative discovery of transcription factor binding sites from location data.

作者信息

Kawada Yuji, Sakakibara Yasubumi

机构信息

Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan.

出版信息

Proc IEEE Comput Syst Bioinform Conf. 2005:86-9. doi: 10.1109/csb.2005.30.

Abstract

MOTIVATION

The availability of genome-wide location analyses based on chromatin immunoprecipitation (ChIP) data gives a new insight for in silico analysis of transcriptional regulations.

RESULTS

We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.

摘要

动机

基于染色质免疫沉淀(ChIP)数据的全基因组定位分析的可用性为转录调控的计算机分析提供了新的见解。

结果

我们提出了一种新颖的判别式发现框架,用于基于全基因组定位数据从正样本和负样本(转录因子(TF)结合和未结合基因的上游序列集)中精确识别转录调控基序。在这个框架中,我们的目标是找到这样的判别基序,即从基序能精确区分正样本和负样本的意义上来说,能最好地解释定位数据。首先,为了发现正样本和负样本之间的一组初始判别子串,我们应用一种决策树学习方法,该方法生成一个文本分类树。我们从学习到的树的内部节点提取由相似子串组成的几个簇。其次,我们从由每个簇构建的初始轮廓隐马尔可夫模型(profile-HMM)开始,用于表示假定的基序,并迭代地优化轮廓隐马尔可夫模型以提高判别准确率。我们在酵母上的全基因组实验结果表明,我们的方法成功地识别了文献中已知TF的共有序列,并在所有TF中区分正样本和负样本方面进一步表现出显著性能,而大多数其他基序检测方法在判别问题上表现非常差。我们学习到的轮廓隐马尔可夫模型也改善了ChIP数据的假阴性预测。

相似文献

1
Discriminative discovery of transcription factor binding sites from location data.
Proc IEEE Comput Syst Bioinform Conf. 2005:86-9. doi: 10.1109/csb.2005.30.
2
On counting position weight matrix matches in a sequence, with application to discriminative motif finding.
Bioinformatics. 2006 Jul 15;22(14):e454-63. doi: 10.1093/bioinformatics/btl227.
3
Learning probabilistic models of cis-regulatory modules that represent logical and spatial aspects.
Bioinformatics. 2007 Jan 15;23(2):e156-62. doi: 10.1093/bioinformatics/btl319.
4
dPattern: transcription factor binding site (TFBS) discovery in human genome using a discriminative pattern analysis.
Bioinformatics. 2007 Oct 1;23(19):2619-21. doi: 10.1093/bioinformatics/btm288. Epub 2007 Jun 5.
5
Sequence features of DNA binding sites reveal structural class of associated transcription factor.
Bioinformatics. 2006 Jan 15;22(2):157-63. doi: 10.1093/bioinformatics/bti731. Epub 2005 Nov 2.
6
Combining comparative genomics with de novo motif discovery to identify human transcription factor DNA-binding motifs.
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S21. doi: 10.1186/1471-2105-7-S4-S21.
7
Recognition of cis-regulatory elements with vombat.
J Bioinform Comput Biol. 2007 Apr;5(2B):561-77. doi: 10.1142/s0219720007002886.
8
Informative priors based on transcription factor structural class improve de novo motif discovery.
Bioinformatics. 2006 Jul 15;22(14):e384-92. doi: 10.1093/bioinformatics/btl251.
9
Adding sequence context to a Markov background model improves the identification of regulatory elements.
Bioinformatics. 2006 Dec 1;22(23):2870-5. doi: 10.1093/bioinformatics/btl528. Epub 2006 Oct 26.
10
A novel Bayesian DNA motif comparison method for clustering and retrieval.
PLoS Comput Biol. 2008 Feb 29;4(2):e1000010. doi: 10.1371/journal.pcbi.1000010.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验