Mathé Catherine, Sagot Marie-France, Schiex Thomas, Rouzé Pierre
Institut de Pharmacologie et Biologie Structurale, UMR 5089, 205 route de Narbonne, F-31077 Toulouse Cedex, France.
Nucleic Acids Res. 2002 Oct 1;30(19):4103-17. doi: 10.1093/nar/gkf543.
While the genomes of many organisms have been sequenced over the last few years, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed that try to address one part of this problem, which consists of locating the genes along a genome. This paper reviews the existing approaches to predicting genes in eukaryotic genomes and underlines their intrinsic advantages and limitations. The main mathematical models and computational algorithms adopted are also briefly described and the resulting software classified according to both the method and the type of evidence used. Finally, the several difficulties and pitfalls encountered by the programs are detailed, showing that improvements are needed and that new directions must be considered.
在过去几年里,许多生物的基因组已被测序,但将这些原始序列数据转化为知识仍是一项艰巨的任务。人们已经开发出大量的预测程序,试图解决这个问题的一部分,即沿着基因组定位基因。本文综述了真核生物基因组中预测基因的现有方法,并强调了它们固有的优点和局限性。还简要描述了所采用的主要数学模型和计算算法,并根据所使用的方法和证据类型对所得软件进行了分类。最后,详细介绍了这些程序遇到的几个困难和陷阱,表明需要改进并必须考虑新的方向。