Suppr超能文献

ZCURVE_V:一种用于识别病毒和噬菌体基因组中蛋白质编码基因的新型自训练系统。

ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes.

作者信息

Guo Feng-Biao, Zhang Chun-Ting

机构信息

Department of Physics, Tianjin University, Tianjin 300072, China.

出版信息

BMC Bioinformatics. 2006 Jan 10;7:9. doi: 10.1186/1471-2105-7-9.

Abstract

BACKGROUND

It necessary to use highly accurate and statistics-based systems for viral and phage genome annotations. The GeneMark systems for gene-finding in virus and phage genomes suffer from some basic drawbacks. This paper puts forward an alternative approach for viral and phage gene-finding to improve the quality of annotations, particularly for newly sequenced genomes.

RESULTS

The new system ZCURVE_V has been run for 979 viral and 212 phage genomes, respectively, and satisfactory results are obtained. To have a fair comparison with the currently available software of similar function, GeneMark, a total of 30 viral genomes that have not been annotated by GeneMark are selected to be tested. Consequently, the average specificity of both systems is well matched, however the average sensitivity of ZCURVE_V for smaller viral genomes (< 100 kb), which constitute the main parts of viral genomes sequenced so far, is higher than that of GeneMark. Additionally, for the genome of Amsacta moorei entomopoxvirus, probably with the lowest genomic GC content among the sequenced organisms, the accuracy of ZCURVE_V is much better than that of GeneMark, because the later predicts hundreds of false-positive genes. ZCURVE_V is also used to analyze well-studied genomes, such as HIV-1, HBV and SARS-CoV. Accordingly, the performance of ZCURVE_V is generally better than that of GeneMark. Finally, ZCURVE_V may be downloaded and run locally, particularly facilitating its utilization, whereas GeneMark is not downloadable. Based on the above comparison, it is suggested that ZCURVE_V may serve as a preferred gene-finding tool for viral and phage genomes newly sequenced. However, it is also shown that the joint application of both systems, ZCURVE_V and GeneMark, leads to better gene-finding results. The system ZCURVE_V is freely available at: http://tubic.tju.edu.cn/Zcurve_V/.

CONCLUSION

ZCURVE_V may serve as a preferred gene-finding tool used for viral and phage genomes, especially for anonymous viral and phage genomes newly sequenced.

摘要

背景

有必要使用高度准确且基于统计的系统进行病毒和噬菌体基因组注释。用于病毒和噬菌体基因组基因查找的基因标记(GeneMark)系统存在一些基本缺陷。本文提出了一种用于病毒和噬菌体基因查找的替代方法,以提高注释质量,特别是对于新测序的基因组。

结果

新系统ZCURVE_V已分别对979个病毒基因组和212个噬菌体基因组进行了运行,并获得了满意的结果。为了与当前具有类似功能的可用软件GeneMark进行公平比较,选择了30个未被GeneMark注释的病毒基因组进行测试。结果表明,两个系统的平均特异性相当匹配,但ZCURVE_V对较小病毒基因组(<100 kb)的平均敏感性更高,而较小病毒基因组构成了迄今为止测序的病毒基因组的主要部分。此外,对于摩尔伊蚊昆虫痘病毒(Amsacta moorei entomopoxvirus)的基因组,其可能是已测序生物体中基因组GC含量最低的,ZCURVE_V的准确性远优于GeneMark,因为后者预测了数百个假阳性基因。ZCURVE_V还用于分析研究充分的基因组,如HIV-1、HBV和SARS-CoV。因此,ZCURVE_V的性能总体上优于GeneMark。最后,ZCURVE_V可以下载并在本地运行,特别便于使用,而GeneMark不可下载。基于上述比较,建议ZCURVE_V可作为新测序的病毒和噬菌体基因组的首选基因查找工具。然而,研究还表明,ZCURVE_V和GeneMark这两个系统联合应用可获得更好的基因查找结果。ZCURVE_V系统可从以下网址免费获取:http://tubic.tju.edu.cn/Zcurve_V/。

结论

ZCURVE_V可作为用于病毒和噬菌体基因组的首选基因查找工具,特别是对于新测序的未知病毒和噬菌体基因组。

相似文献

2
ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes.
Nucleic Acids Res. 2003 Mar 15;31(6):1780-9. doi: 10.1093/nar/gkg254.
3
4
ZCURVE 3.0: identify prokaryotic genes with higher accuracy as well as automatically and accurately select essential genes.
Nucleic Acids Res. 2015 Jul 1;43(W1):W85-90. doi: 10.1093/nar/gkv491. Epub 2015 May 14.
5
VIGOR, an annotation program for small viral genomes.
BMC Bioinformatics. 2010 Sep 7;11:451. doi: 10.1186/1471-2105-11-451.
8
Gene recognition from questionable ORFs in bacterial and archaeal genomes.
J Biomol Struct Dyn. 2003 Aug;21(1):99-109. doi: 10.1080/07391102.2003.10506908.
9
Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES.
Curr Protoc Bioinformatics. 2011 Sep;Chapter 4:4.6.1-4.6.10. doi: 10.1002/0471250953.bi0406s35.
10
A Method for Improving the Accuracy and Efficiency of Bacteriophage Genome Annotation.
Int J Mol Sci. 2019 Jul 10;20(14):3391. doi: 10.3390/ijms20143391.

引用本文的文献

1
Genome, biology and stability of the Thurquoise phage - A new virus from the subfamily.
Front Microbiol. 2023 Mar 14;14:1120147. doi: 10.3389/fmicb.2023.1120147. eCollection 2023.
2
Genome annotation of disease-causing microorganisms.
Brief Bioinform. 2021 Mar 22;22(2):845-854. doi: 10.1093/bib/bbab004.
4
Vgas: A Viral Genome Annotation System.
Front Microbiol. 2019 Feb 13;10:184. doi: 10.3389/fmicb.2019.00184. eCollection 2019.
7
Accurate prediction of human essential genes using only nucleotide composition and association information.
Bioinformatics. 2017 Jun 15;33(12):1758-1764. doi: 10.1093/bioinformatics/btx055.
8
The Complete Genome Sequence of a Second Distinct Betabaculovirus from the True Armyworm, Mythimna unipuncta.
PLoS One. 2017 Jan 19;12(1):e0170510. doi: 10.1371/journal.pone.0170510. eCollection 2017.

本文引用的文献

1
Development of joint application strategies for two microbial gene finders.
Bioinformatics. 2004 Jul 10;20(10):1622-31. doi: 10.1093/bioinformatics/bth137. Epub 2004 Feb 26.
3
GenBank: update.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. doi: 10.1093/nar/gkh045.
4
Improving gene annotation of complete viral genomes.
Nucleic Acids Res. 2003 Dec 1;31(23):7041-55. doi: 10.1093/nar/gkg878.
5
6
The Genome sequence of the SARS-associated coronavirus.
Science. 2003 May 30;300(5624):1399-404. doi: 10.1126/science.1085953. Epub 2003 May 1.
7
A novel coronavirus associated with severe acute respiratory syndrome.
N Engl J Med. 2003 May 15;348(20):1953-66. doi: 10.1056/NEJMoa030781. Epub 2003 Apr 10.
8
ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes.
Nucleic Acids Res. 2003 Mar 15;31(6):1780-9. doi: 10.1093/nar/gkg254.
9
NCBI Reference Sequence project: update and current status.
Nucleic Acids Res. 2003 Jan 1;31(1):34-7. doi: 10.1093/nar/gkg111.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验