Suppr超能文献

一种用于在DNA中寻找基因的决策树系统。

A decision tree system for finding genes in DNA.

作者信息

Salzberg S, Delcher A L, Fasman K H, Henderson J

机构信息

The Institute for Genomic Research, Rockville, Maryland 20850, USA.

出版信息

J Comput Biol. 1998 Winter;5(4):667-80. doi: 10.1089/cmb.1998.5.667.

Abstract

MORGAN is an integrated system for finding genes in vertebrate DNA sequences. MORGAN uses a variety of techniques to accomplish this task, the most distinctive of which is a decision tree classifier. The decision tree system is combined with new methods for identifying start codons, donor sites, and acceptor sites, and these are brought together in a frame-sensitive dynamic programming algorithm that finds the optimal segmentation of a DNA sequence into coding and noncoding regions (exons and introns). The optimal segmentation is dependent on a separate scoring function that takes a subsequence and assigns to it a score reflecting the probability that the sequence is an exon. The scoring functions in MORGAN are sets of decision trees that are combined to give a probability estimate. Experimental results on a database of 570 vertebrate DNA sequences show that MORGAN has excellent performance by many different measures. On a separate test set, it achieves an overall accuracy of 95 %, with a correlation coefficient of 0.78, and a sensitivity and specificity for coding bases of 83 % and 79%. In addition, MORGAN identifies 58% of coding exons exactly; i.e., both the beginning and end of the coding regions are predicted correctly. This paper describes the MORGAN system, including its decision tree routines and the algorithms for site recognition, and its performance on a benchmark database of vertebrate DNA.

摘要

MORGAN是一个用于在脊椎动物DNA序列中寻找基因的集成系统。MORGAN使用多种技术来完成这项任务,其中最独特的是决策树分类器。决策树系统与识别起始密码子、供体位点和受体位点的新方法相结合,并在一种帧敏感动态规划算法中整合在一起,该算法能找到将DNA序列最佳分割为编码区和非编码区(外显子和内含子)的方法。最佳分割取决于一个单独的评分函数,该函数会获取一个子序列并为其分配一个反映该序列是外显子概率的分数。MORGAN中的评分函数是由决策树组成的集合,这些决策树组合起来给出概率估计。对一个包含570个脊椎动物DNA序列的数据库进行的实验结果表明,从许多不同的指标来看,MORGAN都具有出色的性能。在一个单独的测试集上,它的总体准确率达到95%,相关系数为0.78,编码碱基的灵敏度和特异性分别为83%和79%。此外,MORGAN能准确识别58%的编码外显子;也就是说,编码区的起始和结尾都能被正确预测。本文描述了MORGAN系统,包括其决策树程序和位点识别算法,以及它在脊椎动物DNA基准数据库上的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验