Z曲线：一种用于识别细菌和古细菌基因组中蛋白质编码基因的新系统。

ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes.

作者信息

Guo Feng-Biao, Ou Hong-Yu, Zhang Chun-Ting

机构信息

Department of Physics, Tianjin University, Tianjin 300072, China.

出版信息

Nucleic Acids Res. 2003 Mar 15;31(6):1780-9. doi: 10.1093/nar/gkg254.

DOI:10.1093/nar/gkg254

PMID:12626720

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC152858/

Abstract

A new system, ZCURVE 1.0, for finding protein- coding genes in bacterial and archaeal genomes has been proposed. The current algorithm, which is based on the Z curve representation of the DNA sequences, lays stress on the global statistical features of protein-coding genes by taking the frequencies of bases at three codon positions into account. In ZCURVE 1.0, since only 33 parameters are used to characterize the coding sequences, it gives better consideration to both typical and atypical cases, whereas in Markov-model-based methods, e.g. Glimmer 2.02, thousands of parameters are trained, which may result in less adaptability. To compare the performance of the new system with that of Glimmer 2.02, both systems were run, respectively, for 18 genomes not annotated by the Glimmer system. Comparisons were also performed for predicting some function-known genes by both systems. Consequently, the average accuracy of both systems is well matched; however, ZCURVE 1.0 has more accurate gene start prediction, lower additional prediction rate and higher accuracy for the prediction of horizontally transferred genes. It is shown that the joint applications of both systems greatly improve gene-finding results. For a typical genome, e.g. Escherichia coli, the system ZCURVE 1.0 takes approximately 2 min on a Pentium III 866 PC without any human intervention. The system ZCURVE 1.0 is freely available at: http://tubic. tju.edu.cn/Zcurve_B/.

摘要

人们提出了一种名为ZCURVE 1.0的新系统，用于在细菌和古细菌基因组中寻找蛋白质编码基因。当前的算法基于DNA序列的Z曲线表示，通过考虑三个密码子位置的碱基频率，强调了蛋白质编码基因的全局统计特征。在ZCURVE 1.0中，由于仅使用33个参数来表征编码序列，因此它对典型和非典型情况都给予了更好的考虑，而在基于马尔可夫模型的方法（例如Glimmer 2.02）中，要训练数千个参数，这可能导致适应性较差。为了比较新系统与Glimmer 2.02的性能，分别在18个未由Glimmer系统注释的基因组上运行了这两个系统。还对两个系统预测一些功能已知基因进行了比较。结果，两个系统的平均准确率相当；然而，ZCURVE 1.0在基因起始预测方面更准确，额外预测率更低，对水平转移基因的预测准确率更高。结果表明，两个系统的联合应用大大提高了基因发现结果。对于一个典型的基因组，例如大肠杆菌，ZCURVE 1.0系统在一台奔腾III 866 PC上无需任何人工干预大约需要2分钟。ZCURVE 1.0系统可在以下网址免费获取：http://tubic.tju.edu.cn/Zcurve_B/

相似文献

ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes.Z曲线：一种用于识别细菌和古细菌基因组中蛋白质编码基因的新系统。

Nucleic Acids Res. 2003 Mar 15;31(6):1780-9. doi: 10.1093/nar/gkg254.

ZCURVE 3.0: identify prokaryotic genes with higher accuracy as well as automatically and accurately select essential genes.ZCURVE 3.0：以更高的准确性识别原核生物基因，并自动准确地选择必需基因。

Nucleic Acids Res. 2015 Jul 1;43(W1):W85-90. doi: 10.1093/nar/gkv491. Epub 2015 May 14.

Gene recognition from questionable ORFs in bacterial and archaeal genomes.从细菌和古细菌基因组中可疑开放阅读框进行基因识别。

J Biomol Struct Dyn. 2003 Aug;21(1):99-109. doi: 10.1080/07391102.2003.10506908.

ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes.ZCURVE_V：一种用于识别病毒和噬菌体基因组中蛋白质编码基因的新型自训练系统。

BMC Bioinformatics. 2006 Jan 10;7:9. doi: 10.1186/1471-2105-7-9.

[Comprehensive re-annotation of protein-coding genes for prokaryotic genomes by Z-curve and similarity-based methods].[基于Z曲线和相似性方法对原核生物基因组蛋白质编码基因进行全面重新注释]

Yi Chuan. 2020 Jul 20;42(7):691-702. doi: 10.16288/j.yczz.20-022.

GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.GeneMarkS：一种用于预测微生物基因组中基因起始位点的自训练方法。对在调控区域中寻找序列基序的启示。

Nucleic Acids Res. 2001 Jun 15;29(12):2607-18. doi: 10.1093/nar/29.12.2607.

ZCURVE_CoV: a new system to recognize protein coding genes in coronavirus genomes, and its applications in analyzing SARS-CoV genomes.ZCURVE_CoV：一种识别冠状病毒基因组中蛋白质编码基因的新系统及其在分析严重急性呼吸综合征冠状病毒（SARS-CoV）基因组中的应用

Biochem Biophys Res Commun. 2003 Jul 25;307(2):382-8. doi: 10.1016/s0006-291x(03)01192-6.

An integrative method for identifying the over-annotated protein-coding genes in microbial genomes.一种用于鉴定微生物基因组中过注释的蛋白编码基因的综合方法。

DNA Res. 2011 Dec;18(6):435-49. doi: 10.1093/dnares/dsr030. Epub 2011 Sep 8.

Recognition of Protein-coding Genes Based on Z-curve Algorithms.基于 Z-曲线算法的蛋白质编码基因识别。

Curr Genomics. 2014 Apr;15(2):95-103. doi: 10.2174/1389202915999140328162724.

Identify protein-coding genes in the genomes of Aeropyrum pernix K1 and Chlorobium tepidum TLS.鉴定 Aeropyrum pernix K1 和 Chlorobium tepidum TLS 基因组中的编码蛋白基因。

J Biomol Struct Dyn. 2009 Feb;26(4):413-20. doi: 10.1080/07391102.2009.10507256.

引用本文的文献

LMFE: A Novel Method for Predicting Plant LncRNA Based on Multi-Feature Fusion and Ensemble Learning.LMFE：一种基于多特征融合与集成学习预测植物长链非编码RNA的新方法。

Genes (Basel). 2025 Mar 31;16(4):424. doi: 10.3390/genes16040424.

Going through phages: a computational approach to revealing the role of prophage in .噬菌体探秘：一种揭示原噬菌体在……中作用的计算方法

Access Microbiol. 2023 Jun 16;5(6):acmi000424. doi: 10.1099/acmi.0.000424. eCollection 2023.

An Evidence Theory and Fuzzy Logic Combined Approach for the Prediction of Potential ARF-Regulated Genes in Quinoa.一种结合证据理论和模糊逻辑的方法用于预测藜麦中潜在的ARF调控基因。

Plants (Basel). 2022 Dec 23;12(1):71. doi: 10.3390/plants12010071.

Recombineering in Non-Model Bacteria.非模式细菌中的重组。

Curr Protoc. 2022 Dec;2(12):e605. doi: 10.1002/cpz1.605.

The genome and antigen proteome analysis of .……的基因组和抗原蛋白质组分析

Front Microbiol. 2022 Nov 2;13:996938. doi: 10.3389/fmicb.2022.996938. eCollection 2022.

Circ-LocNet: A Computational Framework for Circular RNA Sub-Cellular Localization Prediction.Circ-LocNet：一种用于环状 RNA 亚细胞定位预测的计算框架。

Int J Mol Sci. 2022 Jul 26;23(15):8221. doi: 10.3390/ijms23158221.

Reconstruction and analysis of carbon metabolic pathway of Ketogulonicigenium vulgare SPU B805 by genome and transcriptome.通过基因组和转录组重建和分析酮古龙酸发酵短杆菌 SPU B805 的碳代谢途径。

Sci Rep. 2018 Dec 13;8(1):17838. doi: 10.1038/s41598-018-36038-3.

Dempster-Shafer Theory for the Prediction of Auxin-Response Elements (AuxREs) in Plant Genomes.Dempster-Shafer 理论在植物基因组中预测生长素响应元件 (AuxREs) 的应用。

Biomed Res Int. 2018 Nov 1;2018:3837060. doi: 10.1155/2018/3837060. eCollection 2018.

Gene Prediction in Metagenomic Fragments with Deep Learning.利用深度学习进行宏基因组片段中的基因预测

Biomed Res Int. 2017;2017:4740354. doi: 10.1155/2017/4740354. Epub 2017 Nov 8.

A Comprehensive Overview of Online Resources to Identify and Predict Bacterial Essential Genes.用于识别和预测细菌必需基因的在线资源综述

Front Microbiol. 2017 Nov 27;8:2331. doi: 10.3389/fmicb.2017.02331. eCollection 2017.

本文引用的文献

HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes.HGT-DB：原核生物完整基因组中假定水平转移基因的数据库。

Nucleic Acids Res. 2003 Jan 1;31(1):187-9. doi: 10.1093/nar/gkg004.

A probabilistic method for identifying start codons in bacterial genomes.一种用于识别细菌基因组中起始密码子的概率方法。

Bioinformatics. 2001 Dec;17(12):1123-30. doi: 10.1093/bioinformatics/17.12.1123.

Identification of protein-coding genes in the genome of Vibrio cholerae with more than 98% accuracy using occurrence frequencies of single nucleotides.利用单核苷酸出现频率以超过98%的准确率鉴定霍乱弧菌基因组中的蛋白质编码基因。

Eur J Biochem. 2001 Aug;268(15):4261-8. doi: 10.1046/j.1432-1327.2001.02341.x.

A novel bacterial gene-finding system with improved accuracy in locating start codons.一种在定位起始密码子方面具有更高准确性的新型细菌基因发现系统。

DNA Res. 2001 Jun 30;8(3):97-106. doi: 10.1093/dnares/8.3.97.

Nucleic Acids Res. 2001 Jun 15;29(12):2607-18. doi: 10.1093/nar/29.12.2607.

Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen.铜绿假单胞菌PAO1（一种机会致病菌）的全基因组序列

Nature. 2000 Aug 31;406(6799):959-64. doi: 10.1038/35023079.

Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve.基于Z曲线，在酵母基因组中识别蛋白质编码基因，准确率超过95%。

Nucleic Acids Res. 2000 Jul 15;28(14):2804-14. doi: 10.1093/nar/28.14.2804.

EcoGene: a genome sequence database for Escherichia coli K-12.EcoGene：大肠杆菌K-12的基因组序列数据库。

Nucleic Acids Res. 2000 Jan 1;28(1):60-4. doi: 10.1093/nar/28.1.60.

Compositional correlation studies among the three different codon positions in 12 bacterial genomes.12个细菌基因组中三个不同密码子位置之间的组成相关性研究。

Biochem Biophys Res Commun. 1999 Dec 9;266(1):66-71. doi: 10.1006/bbrc.1999.1774.

Improved microbial gene identification with GLIMMER.利用GLIMMER改进微生物基因识别。

Nucleic Acids Res. 1999 Dec 1;27(23):4636-41. doi: 10.1093/nar/27.23.4636.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验