Guo Feng-Biao, Ou Hong-Yu, Zhang Chun-Ting
Department of Physics, Tianjin University, Tianjin 300072, China.
Nucleic Acids Res. 2003 Mar 15;31(6):1780-9. doi: 10.1093/nar/gkg254.
A new system, ZCURVE 1.0, for finding protein- coding genes in bacterial and archaeal genomes has been proposed. The current algorithm, which is based on the Z curve representation of the DNA sequences, lays stress on the global statistical features of protein-coding genes by taking the frequencies of bases at three codon positions into account. In ZCURVE 1.0, since only 33 parameters are used to characterize the coding sequences, it gives better consideration to both typical and atypical cases, whereas in Markov-model-based methods, e.g. Glimmer 2.02, thousands of parameters are trained, which may result in less adaptability. To compare the performance of the new system with that of Glimmer 2.02, both systems were run, respectively, for 18 genomes not annotated by the Glimmer system. Comparisons were also performed for predicting some function-known genes by both systems. Consequently, the average accuracy of both systems is well matched; however, ZCURVE 1.0 has more accurate gene start prediction, lower additional prediction rate and higher accuracy for the prediction of horizontally transferred genes. It is shown that the joint applications of both systems greatly improve gene-finding results. For a typical genome, e.g. Escherichia coli, the system ZCURVE 1.0 takes approximately 2 min on a Pentium III 866 PC without any human intervention. The system ZCURVE 1.0 is freely available at: http://tubic. tju.edu.cn/Zcurve_B/.
人们提出了一种名为ZCURVE 1.0的新系统,用于在细菌和古细菌基因组中寻找蛋白质编码基因。当前的算法基于DNA序列的Z曲线表示,通过考虑三个密码子位置的碱基频率,强调了蛋白质编码基因的全局统计特征。在ZCURVE 1.0中,由于仅使用33个参数来表征编码序列,因此它对典型和非典型情况都给予了更好的考虑,而在基于马尔可夫模型的方法(例如Glimmer 2.02)中,要训练数千个参数,这可能导致适应性较差。为了比较新系统与Glimmer 2.02的性能,分别在18个未由Glimmer系统注释的基因组上运行了这两个系统。还对两个系统预测一些功能已知基因进行了比较。结果,两个系统的平均准确率相当;然而,ZCURVE 1.0在基因起始预测方面更准确,额外预测率更低,对水平转移基因的预测准确率更高。结果表明,两个系统的联合应用大大提高了基因发现结果。对于一个典型的基因组,例如大肠杆菌,ZCURVE 1.0系统在一台奔腾III 866 PC上无需任何人工干预大约需要2分钟。ZCURVE 1.0系统可在以下网址免费获取:http://tubic.tju.edu.cn/Zcurve_B/