Suppr超能文献

利用转录因子结合位点预测RNA聚合酶II启动子序列

Predicting Pol II promoter sequences using transcription factor binding sites.

作者信息

Prestridge D S

机构信息

Molecular Biology Computing Center, University of Minnesota, St Paul 55108, USA.

出版信息

J Mol Biol. 1995 Jun 23;249(5):923-32. doi: 10.1006/jmbi.1995.0349.

Abstract

A computer program, PROMOTER SCAN, has been developed to recognize a high percentage of Pol II promoter sequences while allowing only a small rate of false positives. A total of 167 primate Pol II promoter sequences, obtained from the Eukaryotic Promoter Database, and 999 primate non-promoter sequences, obtained from the GenBank sequence databank, were used in the analysis. Both promoter and non-promoter sequences were analyzed for the comparative density of each unique mammalian transcription factor binding site listed in the Ghosh Transcription Factor Database. The density of each of these binding sites was then used to derive a ratio of density of each transcriptional element in promoter compared to non-promoter sequences. The combined individual density ratios of all binding sites were then collectively used to build a scoring profile called the Promoter Recognition Profile. This profile, used in combination with a weighted matrix for scoring a TATA box, was then used by the PROMOTER SCAN program to test the prediction of promoter sequences and the ability of the computer program to discriminate them from non-promoter sequences. When the promoter cutoff score was set so that 70% of promoters were recognized correctly by the program, a false positive rate of about 1/5600 bases was observed in the non-promoter sequence set. PROMOTER SCAN is now being developed for public distribution.

摘要

已开发出一种名为“启动子扫描”(PROMOTER SCAN)的计算机程序,该程序能够识别出高比例的RNA聚合酶II(Pol II)启动子序列,同时假阳性率极低。分析过程中使用了从真核生物启动子数据库获取的总共167个灵长类动物Pol II启动子序列,以及从GenBank序列数据库获取的999个灵长类动物非启动子序列。针对戈什转录因子数据库中列出的每个独特哺乳动物转录因子结合位点的比较密度,对启动子和非启动子序列都进行了分析。然后利用这些结合位点各自密度,得出启动子序列与非启动子序列中每个转录元件的密度比。所有结合位点的各个密度比汇总在一起,用于构建一个名为“启动子识别图谱”的评分图谱。这个图谱与用于给TATA框评分的加权矩阵结合使用,随后“启动子扫描”程序利用它来测试启动子序列的预测结果,以及该计算机程序区分启动子序列和非启动子序列的能力。当将启动子截止分数设定为该程序能正确识别70%的启动子时,在非启动子序列集中观察到的假阳性率约为每碱基出现1/5600次。目前正在对“启动子扫描”进行开发,以供公众使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验