Guo Feng-Biao, Zhang Chun-Ting
Department of Physics, Tianjin University, Tianjin 300072, China.
BMC Bioinformatics. 2006 Jan 10;7:9. doi: 10.1186/1471-2105-7-9.
It necessary to use highly accurate and statistics-based systems for viral and phage genome annotations. The GeneMark systems for gene-finding in virus and phage genomes suffer from some basic drawbacks. This paper puts forward an alternative approach for viral and phage gene-finding to improve the quality of annotations, particularly for newly sequenced genomes.
The new system ZCURVE_V has been run for 979 viral and 212 phage genomes, respectively, and satisfactory results are obtained. To have a fair comparison with the currently available software of similar function, GeneMark, a total of 30 viral genomes that have not been annotated by GeneMark are selected to be tested. Consequently, the average specificity of both systems is well matched, however the average sensitivity of ZCURVE_V for smaller viral genomes (< 100 kb), which constitute the main parts of viral genomes sequenced so far, is higher than that of GeneMark. Additionally, for the genome of Amsacta moorei entomopoxvirus, probably with the lowest genomic GC content among the sequenced organisms, the accuracy of ZCURVE_V is much better than that of GeneMark, because the later predicts hundreds of false-positive genes. ZCURVE_V is also used to analyze well-studied genomes, such as HIV-1, HBV and SARS-CoV. Accordingly, the performance of ZCURVE_V is generally better than that of GeneMark. Finally, ZCURVE_V may be downloaded and run locally, particularly facilitating its utilization, whereas GeneMark is not downloadable. Based on the above comparison, it is suggested that ZCURVE_V may serve as a preferred gene-finding tool for viral and phage genomes newly sequenced. However, it is also shown that the joint application of both systems, ZCURVE_V and GeneMark, leads to better gene-finding results. The system ZCURVE_V is freely available at: http://tubic.tju.edu.cn/Zcurve_V/.
ZCURVE_V may serve as a preferred gene-finding tool used for viral and phage genomes, especially for anonymous viral and phage genomes newly sequenced.
有必要使用高度准确且基于统计的系统进行病毒和噬菌体基因组注释。用于病毒和噬菌体基因组基因查找的基因标记(GeneMark)系统存在一些基本缺陷。本文提出了一种用于病毒和噬菌体基因查找的替代方法,以提高注释质量,特别是对于新测序的基因组。
新系统ZCURVE_V已分别对979个病毒基因组和212个噬菌体基因组进行了运行,并获得了满意的结果。为了与当前具有类似功能的可用软件GeneMark进行公平比较,选择了30个未被GeneMark注释的病毒基因组进行测试。结果表明,两个系统的平均特异性相当匹配,但ZCURVE_V对较小病毒基因组(<100 kb)的平均敏感性更高,而较小病毒基因组构成了迄今为止测序的病毒基因组的主要部分。此外,对于摩尔伊蚊昆虫痘病毒(Amsacta moorei entomopoxvirus)的基因组,其可能是已测序生物体中基因组GC含量最低的,ZCURVE_V的准确性远优于GeneMark,因为后者预测了数百个假阳性基因。ZCURVE_V还用于分析研究充分的基因组,如HIV-1、HBV和SARS-CoV。因此,ZCURVE_V的性能总体上优于GeneMark。最后,ZCURVE_V可以下载并在本地运行,特别便于使用,而GeneMark不可下载。基于上述比较,建议ZCURVE_V可作为新测序的病毒和噬菌体基因组的首选基因查找工具。然而,研究还表明,ZCURVE_V和GeneMark这两个系统联合应用可获得更好的基因查找结果。ZCURVE_V系统可从以下网址免费获取:http://tubic.tju.edu.cn/Zcurve_V/。
ZCURVE_V可作为用于病毒和噬菌体基因组的首选基因查找工具,特别是对于新测序的未知病毒和噬菌体基因组。