Suppr超能文献

ZCURVE_V:一种用于识别病毒和噬菌体基因组中蛋白质编码基因的新型自训练系统。

ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes.

作者信息

Guo Feng-Biao, Zhang Chun-Ting

机构信息

Department of Physics, Tianjin University, Tianjin 300072, China.

出版信息

BMC Bioinformatics. 2006 Jan 10;7:9. doi: 10.1186/1471-2105-7-9.

Abstract

BACKGROUND

It necessary to use highly accurate and statistics-based systems for viral and phage genome annotations. The GeneMark systems for gene-finding in virus and phage genomes suffer from some basic drawbacks. This paper puts forward an alternative approach for viral and phage gene-finding to improve the quality of annotations, particularly for newly sequenced genomes.

RESULTS

The new system ZCURVE_V has been run for 979 viral and 212 phage genomes, respectively, and satisfactory results are obtained. To have a fair comparison with the currently available software of similar function, GeneMark, a total of 30 viral genomes that have not been annotated by GeneMark are selected to be tested. Consequently, the average specificity of both systems is well matched, however the average sensitivity of ZCURVE_V for smaller viral genomes (< 100 kb), which constitute the main parts of viral genomes sequenced so far, is higher than that of GeneMark. Additionally, for the genome of Amsacta moorei entomopoxvirus, probably with the lowest genomic GC content among the sequenced organisms, the accuracy of ZCURVE_V is much better than that of GeneMark, because the later predicts hundreds of false-positive genes. ZCURVE_V is also used to analyze well-studied genomes, such as HIV-1, HBV and SARS-CoV. Accordingly, the performance of ZCURVE_V is generally better than that of GeneMark. Finally, ZCURVE_V may be downloaded and run locally, particularly facilitating its utilization, whereas GeneMark is not downloadable. Based on the above comparison, it is suggested that ZCURVE_V may serve as a preferred gene-finding tool for viral and phage genomes newly sequenced. However, it is also shown that the joint application of both systems, ZCURVE_V and GeneMark, leads to better gene-finding results. The system ZCURVE_V is freely available at: http://tubic.tju.edu.cn/Zcurve_V/.

CONCLUSION

ZCURVE_V may serve as a preferred gene-finding tool used for viral and phage genomes, especially for anonymous viral and phage genomes newly sequenced.

摘要

背景

有必要使用高度准确且基于统计的系统进行病毒和噬菌体基因组注释。用于病毒和噬菌体基因组基因查找的基因标记(GeneMark)系统存在一些基本缺陷。本文提出了一种用于病毒和噬菌体基因查找的替代方法,以提高注释质量,特别是对于新测序的基因组。

结果

新系统ZCURVE_V已分别对979个病毒基因组和212个噬菌体基因组进行了运行,并获得了满意的结果。为了与当前具有类似功能的可用软件GeneMark进行公平比较,选择了30个未被GeneMark注释的病毒基因组进行测试。结果表明,两个系统的平均特异性相当匹配,但ZCURVE_V对较小病毒基因组(<100 kb)的平均敏感性更高,而较小病毒基因组构成了迄今为止测序的病毒基因组的主要部分。此外,对于摩尔伊蚊昆虫痘病毒(Amsacta moorei entomopoxvirus)的基因组,其可能是已测序生物体中基因组GC含量最低的,ZCURVE_V的准确性远优于GeneMark,因为后者预测了数百个假阳性基因。ZCURVE_V还用于分析研究充分的基因组,如HIV-1、HBV和SARS-CoV。因此,ZCURVE_V的性能总体上优于GeneMark。最后,ZCURVE_V可以下载并在本地运行,特别便于使用,而GeneMark不可下载。基于上述比较,建议ZCURVE_V可作为新测序的病毒和噬菌体基因组的首选基因查找工具。然而,研究还表明,ZCURVE_V和GeneMark这两个系统联合应用可获得更好的基因查找结果。ZCURVE_V系统可从以下网址免费获取:http://tubic.tju.edu.cn/Zcurve_V/。

结论

ZCURVE_V可作为用于病毒和噬菌体基因组的首选基因查找工具,特别是对于新测序的未知病毒和噬菌体基因组。

相似文献

5
VIGOR, an annotation program for small viral genomes.VIGOR,一个小型病毒基因组注释程序。
BMC Bioinformatics. 2010 Sep 7;11:451. doi: 10.1186/1471-2105-11-451.
9
Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES.使用GeneMark.hmm-E和GeneMark-ES进行真核基因预测。
Curr Protoc Bioinformatics. 2011 Sep;Chapter 4:4.6.1-4.6.10. doi: 10.1002/0471250953.bi0406s35.

引用本文的文献

2
Genome annotation of disease-causing microorganisms.疾病微生物的基因组注释。
Brief Bioinform. 2021 Mar 22;22(2):845-854. doi: 10.1093/bib/bbab004.
4
Vgas: A Viral Genome Annotation System.Vgas:一种病毒基因组注释系统。
Front Microbiol. 2019 Feb 13;10:184. doi: 10.3389/fmicb.2019.00184. eCollection 2019.

本文引用的文献

1
Development of joint application strategies for two microbial gene finders.两种微生物基因发现工具联合应用策略的开发
Bioinformatics. 2004 Jul 10;20(10):1622-31. doi: 10.1093/bioinformatics/bth137. Epub 2004 Feb 26.
3
GenBank: update.基因库:更新。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. doi: 10.1093/nar/gkh045.
4
Improving gene annotation of complete viral genomes.改进完整病毒基因组的基因注释。
Nucleic Acids Res. 2003 Dec 1;31(23):7041-55. doi: 10.1093/nar/gkg878.
6
The Genome sequence of the SARS-associated coronavirus.与严重急性呼吸综合征相关的冠状病毒的基因组序列。
Science. 2003 May 30;300(5624):1399-404. doi: 10.1126/science.1085953. Epub 2003 May 1.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验