Murakami K, Takagi T
1Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokane-dai Minato-ku, Tokyo 108-8639 and 2Central Research Laboratory, Hitachi Ltd, 1-280 Higashi-Koigakubo, Kokubunji-shi, Tokyo 185-8601, Japan.
Bioinformatics. 1998;14(8):665-75. doi: 10.1093/bioinformatics/14.8.665.
A number of programs have been developed to predict the eukaryotic gene structures in DNA sequences. However, gene finding is still a challenging problem.
We have explored the effectiveness when the results of several gene-finding programs were re-analyzed and combined. We studied several methods with four programs (FEXH, GeneParser3, GEN-SCAN and GRAIL2). By HIGHEST-policy combination method or BOUNDARY method, approximate correlation (AC) improved by 3-5% in comparison with the best single gene-finding program. From another viewpoint, OR-based combination of the four programs is the most reliable to know whether a candidate exon overlaps with the real exon or not, although it is less sensitive than GENSCAN for exon-intron boundaries. Our methods can easily be extended to combine other programs.
We have developed a server program (Shirokane System) and a client program (GeneScope) to use the methods. GeneScope is available through a WWW site (http://gf.genome.ad.jp/).
(katsu,takagi)@ims.u-tokyo.ac.jp
已经开发了许多程序来预测DNA序列中的真核基因结构。然而,基因识别仍然是一个具有挑战性的问题。
我们探讨了对几个基因识别程序的结果进行重新分析和组合时的有效性。我们用四个程序(FEXH、GeneParser3、GENSCAN和GRAIL2)研究了几种方法。通过最高策略组合方法或边界方法,与最佳的单个基因识别程序相比,近似相关性(AC)提高了3 - 5%。从另一个角度来看,基于或运算的四个程序的组合对于了解候选外显子是否与真实外显子重叠是最可靠的,尽管它对外显子 - 内含子边界的敏感性不如GENSCAN。我们的方法可以很容易地扩展以组合其他程序。
我们开发了一个服务器程序(白神系统)和一个客户端程序(GeneScope)来使用这些方法。GeneScope可通过万维网站点(http://gf.genome.ad.jp/)获得。
(katsu,takagi)@ims.u-tokyo.ac.jp