Suppr超能文献

通过“逐帧”算法寻找原核生物基因:靶向基因起始位点和重叠基因。

Finding prokaryotic genes by the 'frame-by-frame' algorithm: targeting gene starts and overlapping genes.

作者信息

Shmatkov A M, Melikyan A A, Chernousko F L, Borodovsky M

机构信息

Russian Academy of Science, Institute for Problems in Mechanics, Moscow, Russia.

出版信息

Bioinformatics. 1999 Nov;15(11):874-86. doi: 10.1093/bioinformatics/15.11.874.

Abstract

MOTIVATION

Tightly packed prokaryotic genes frequently overlap with each other. This feature, rarely seen in eukaryotic DNA, makes detection of translation initiation sites and, therefore, exact predictions of prokaryotic genes notoriously difficult. Improving the accuracy of precise gene prediction in prokaryotic genomic DNA remains an important open problem.

RESULTS

A software program implementing a new algorithm utilizing a uniform Hidden Markov Model for prokaryotic gene prediction was developed. The algorithm analyzes a given DNA sequence in each of six possible global reading frames independently. Twelve complete prokaryotic genomes were analyzed using the new tool. The accuracy of gene finding, predicting locations of protein-coding ORFs, as well as the accuracy of precise gene prediction, and detecting the whole gene including translation initiation codon were assessed by comparison with existing annotation. It was shown that in terms of gene finding, the program performs at least as well as the previously developed tools, such as GeneMark and GLIMMER. In terms of precise gene prediction the new program was shown to be more accurate, by several percentage points, than earlier developed tools, such as GeneMark.hmm, ECOPARSE and ORPHEUS. The results of testing the program indicated the possibility of systematic bias in start codon annotation in several early sequenced prokaryotic genomes.

AVAILABILITY

The new gene-finding program can be accessed through the Web site: http:@dixie.biology.gatech.edu/GeneMark/fbf.cgi

CONTACT

mark@amber.gatech.edu.

摘要

动机

紧密排列的原核基因经常相互重叠。这种在真核DNA中罕见的特征使得翻译起始位点的检测变得困难,因此,原核基因的精确预测也非常困难。提高原核基因组DNA中精确基因预测的准确性仍然是一个重要的开放性问题。

结果

开发了一种软件程序,该程序实现了一种利用统一隐马尔可夫模型进行原核基因预测的新算法。该算法独立分析六个可能的全局阅读框中的每一个给定DNA序列。使用这个新工具分析了12个完整的原核基因组。通过与现有注释进行比较,评估了基因发现、蛋白质编码开放阅读框位置预测的准确性,以及精确基因预测和检测包括翻译起始密码子在内的整个基因的准确性。结果表明,在基因发现方面,该程序的表现至少与之前开发的工具(如GeneMark和GLIMMER)一样好。在精确基因预测方面,新程序比早期开发的工具(如GeneMark.hmm、ECOPARSE和ORPHEUS)更准确,高出几个百分点。对该程序的测试结果表明,在几个早期测序的原核基因组中,起始密码子注释可能存在系统偏差。

可用性

可以通过网站http:@dixie.biology.gatech.edu/GeneMark/fbf.cgi访问这个新的基因发现程序。

联系方式

mark@amber.gatech.edu

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验