Suppr超能文献

ARTS:人类转录起始位点的准确识别

ARTS: accurate recognition of transcription starts in human.

作者信息

Sonnenburg Sören, Zien Alexander, Rätsch Gunnar

机构信息

Fraunhofer Institute, FIRST Kekuléstr. 7, Berlin, Germany.

出版信息

Bioinformatics. 2006 Jul 15;22(14):e472-80. doi: 10.1093/bioinformatics/btl250.

Abstract

UNLABELLED

We develop new methods for finding transcription start sites (TSS) of RNA Polymerase II binding genes in genomic DNA sequences. Employing Support Vector Machines with advanced sequence kernels, we achieve drastically higher prediction accuracies than state-of-the-art methods.

MOTIVATION

One of the most important features of genomic DNA are the protein-coding genes. While it is of great value to identify those genes and the encoded proteins, it is also crucial to understand how their transcription is regulated. To this end one has to identify the corresponding promoters and the contained transcription factor binding sites. TSS finders can be used to locate potential promoters. They may also be used in combination with other signal and content detectors to resolve entire gene structures.

RESULTS

We have developed a novel kernel based method - called ARTS - that accurately recognizes transcription start sites in human. The application of otherwise too computationally expensive Support Vector Machines was made possible due to the use of efficient training and evaluation techniques using suffix tries. In a carefully designed experimental study, we compare our TSS finder to state-of-the-art methods from the literature: McPromoter, Eponine and FirstEF. For given false positive rates within a reasonable range, we consistently achieve considerably higher true positive rates. For instance, ARTS finds about 35% true positives at a false positive rate of 1/1000, where the other methods find about a half (18%).

AVAILABILITY

Datasets, model selection results, whole genome predictions, and additional experimental results are available at http://www.fml.tuebingen.mpg.de/raetsch/projects/arts.

摘要

未标注

我们开发了新方法,用于在基因组DNA序列中寻找RNA聚合酶II结合基因的转录起始位点(TSS)。通过使用具有先进序列核的支持向量机,我们实现了比现有方法高得多的预测准确率。

动机

基因组DNA最重要的特征之一是蛋白质编码基因。虽然识别这些基因及其编码的蛋白质非常有价值,但了解它们的转录如何调控也至关重要。为此,必须识别相应的启动子和其中包含的转录因子结合位点。TSS查找器可用于定位潜在的启动子。它们也可与其他信号和内容检测器结合使用,以解析整个基因结构。

结果

我们开发了一种基于核的新方法——称为ARTS——它能准确识别人类的转录起始位点。由于使用了基于后缀树的高效训练和评估技术,使得原本计算成本过高的支持向量机的应用成为可能。在一项精心设计的实验研究中,我们将我们的TSS查找器与文献中的现有方法:McPromoter、Eponine和FirstEF进行了比较。在合理范围内给定误报率的情况下,我们始终能实现显著更高的真阳性率。例如,ARTS在误报率为1/1000时能找到约35%的真阳性,而其他方法只能找到约一半(18%)。

可用性

数据集、模型选择结果、全基因组预测以及其他实验结果可在http://www.fml.tuebingen.mpg.de/raetsch/projects/arts获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验