Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, Oslo University Hospital, Rikshospitalet, NO-0027, Oslo, Norway, and Department of Informatics, University of Oslo, PO Box 1080 Blindern, NO-0316, Oslo, Norway.
Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, 2800 Lyngby, Denmark.
Microbiology (Reading). 2010 Mar;156(Pt 3):603-608. doi: 10.1099/mic.0.038257-0. Epub 2010 Jan 21.
There are now more than 1000 sequenced prokaryotic genomes deposited in public databases and available for analysis. Currently, although the sequence databases GenBank, DNA Database of Japan and EMBL are synchronized continually, there are slight differences in content at the genomes level for a variety of logistical reasons, including differences in format and loading errors, such as those caused by file transfer protocol interruptions. This means that the 1000th genome will be different in the various databases. Some of the data on the highly accessed web pages are inaccurate, leading to false conclusions for example about the largest bacterial genome sequenced. Biological diversity is far greater than many have thought. For example, analysis of multiple Escherichia coli genomes has led to an estimate of around 45 000 gene families - more genes than are recognized in the human genome. Moreover, of the 1000 genomes available, not a single protein is conserved across all genomes. Excluding the members of the Archaea, only a total of four genes are conserved in all bacteria: two protein genes and two RNA genes.
现在已经有超过 1000 个已测序的原核生物基因组被存入公共数据库,可供分析。尽管目前 GenBank、DNA 数据库日本和 EMBL 这三个序列数据库持续进行同步,但由于各种后勤原因,包括格式差异和加载错误(例如文件传输协议中断导致的错误),在基因组水平上,它们的内容仍存在细微差异。这意味着,在不同的数据库中,第 1000 个基因组将会有所不同。一些高访问网页上的数据不准确,导致得出错误的结论,例如关于测序的最大细菌基因组。生物多样性远比许多人想象的要大。例如,对多个大肠杆菌基因组的分析导致了大约 45000 个基因家族的估计——比人类基因组中识别出的基因还要多。此外,在这 1000 个可用的基因组中,没有一个蛋白质在所有基因组中都是保守的。不包括古菌成员,所有细菌中共有的基因只有四个:两个蛋白质基因和两个 RNA 基因。