Poole Farris L, Gerwe Brian A, Hopkins Robert C, Schut Gerrit J, Weinberg Michael V, Jenney Francis E, Adams Michael W W
Department of Biochemistry and Molecular Biol., Davison Life Sciences Complex, University of Georgia, Athens, GA 30602-7229, USA.
J Bacteriol. 2005 Nov;187(21):7325-32. doi: 10.1128/JB.187.21.7325-7332.2005.
The original genome annotation of the hyperthermophilic archaeon Pyrococcus furiosus contained 2,065 open reading frames (ORFs). The genome was subsequently automatically annotated in two public databases by the Institute for Genomic Research (TIGR) and the National Center for Biotechnology Information (NCBI). Remarkably, more than 500 of the originally annotated ORFs differ in size in the two databases, many very significantly. For example, more than 170 of the predicted proteins differ at their N termini by more than 25 amino acids. Similar discrepancies were observed in the TIGR and NCBI databases with the other archaeal and bacterial genomes examined. In addition, the two databases contain 60 (NCBI) and 221 (TIGR) ORFs not present in the original annotation of P. furiosus. In the present study we have experimentally assessed the validity of 88 previously unannotated ORFs. Transcriptional analyses showed that 11 of 61 ORFs examined were expressed in P. furiosus when grown at either 95 or 72 degrees C. In addition, 7 of 54 ORFs examined yielded heat-stable recombinant proteins when they were expressed in Escherichia coli, although only one of the seven ORFs was expressed in P. furiosus under the growth conditions tested. It is concluded that the P. furiosus genome contains at least 17 ORFs not previously recognized in the original annotation. This study serves to highlight the discrepancies in the public databases and the problems of accurately defining the number and sizes of ORFs within any microbial genome.
嗜热古菌激烈火球菌(Pyrococcus furiosus)最初的基因组注释包含2065个开放阅读框(ORF)。随后,基因组由基因组研究所(TIGR)和美国国立生物技术信息中心(NCBI)在两个公共数据库中进行了自动注释。值得注意的是,最初注释的ORF中有500多个在两个数据库中的大小不同,许多差异非常显著。例如,超过170个预测的蛋白质在其N端相差超过25个氨基酸。在TIGR和NCBI数据库中,对其他古菌和细菌基因组的研究也观察到了类似的差异。此外,这两个数据库包含60个(NCBI)和221个(TIGR)在激烈火球菌最初注释中不存在的ORF。在本研究中,我们通过实验评估了88个先前未注释的ORF的有效性。转录分析表明,在61个检测的ORF中,有11个在95℃或72℃生长的激烈火球菌中表达。此外,在54个检测的ORF中,有7个在大肠杆菌中表达时产生了热稳定的重组蛋白,尽管在测试的生长条件下,七个ORF中只有一个在激烈火球菌中表达。得出的结论是,激烈火球菌基因组至少包含17个在最初注释中未被识别的ORF。这项研究突出了公共数据库中的差异以及准确确定任何微生物基因组中ORF数量和大小的问题。