Suppr超能文献

使用差异六聚体频率分析预测脊椎动物启动子区域

The prediction of vertebrate promoter regions using differential hexamer frequency analysis.

作者信息

Hutchinson G B

机构信息

Department of Medical Genetics, University of British Columbia, Vancouver, Canada.

出版信息

Comput Appl Biosci. 1996 Oct;12(5):391-8. doi: 10.1093/bioinformatics/12.5.391.

Abstract

MOTIVATION

To develop an algorithm utilizing differential hexamer frequency analysis to discriminate promoter from non-promoter regions in vertebrate DNA sequence, without relying upon an extensive database of known transcriptional elements.

RESULTS

By determining hexamer frequencies derived from known promoter regions, coding regions and non-coding regions in vertebrates' DNA sequence, and a formula first applied by Claverie and Bougueleret (1986), a discriminant measure was created that compares promoter regions with coding (D1) and non-coding (D2) sequence. The algorithm is able to identify correctly the promoter regions in 18 of 29 loci (62.1%) from an independent test data set. With program options set to identify only one promoter region in the forward strand, there are 11 false-positive predictions in 208 714 nucleotides (one false positive in 18 974 single-stranded bp). With options set to analyze sequence in discrete segments, there is no appreciable improvement in sensitivity, whereas the specificity falls off predictably. It is of particular interest than a search for a peak score (independent of an absolute threshold) is more accurate that a search based upon a fixed scoring threshold. This suggests that the selection of promoter sites may be influenced by the global properties of an entire sequence domain, rather than exclusively upon local characteristics.

摘要

动机

开发一种利用六聚体频率差异分析的算法,以在不依赖大量已知转录元件数据库的情况下,区分脊椎动物DNA序列中的启动子区域和非启动子区域。

结果

通过确定脊椎动物DNA序列中已知启动子区域、编码区域和非编码区域的六聚体频率,并采用Claverie和Bougueleret(1986年)首次应用的公式,创建了一种判别方法,该方法将启动子区域与编码(D1)和非编码(D2)序列进行比较。该算法能够从独立测试数据集中正确识别29个位点中的18个(62.1%)的启动子区域。当程序选项设置为仅在前导链中识别一个启动子区域时,在208714个核苷酸中有11个假阳性预测(在18974个单链碱基对中有一个假阳性)。当选项设置为以离散片段分析序列时,灵敏度没有明显提高,而特异性则可预测地下降。特别值得注意的是,寻找峰值分数(独立于绝对阈值)比基于固定评分阈值的搜索更准确。这表明启动子位点的选择可能受整个序列域的全局特性影响,而不仅仅取决于局部特征。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验