The ithree Institute, University of Technology Sydney, Ultimo, NSW, Australia.
Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria, Australia.
Microb Genom. 2019 Oct;5(10). doi: 10.1099/mgen.0.000298.
The global clone 1 isolate AB307-0294, recovered in the USA in 1994, and the global clone 2 (GC2) isolate ACICU, isolated in 2005 in Italy, were among the first isolates to be completely sequenced. AB307-0294 is susceptible to most antibiotics and has been used in many genetic studies, and ACICU belongs to a rare GC2 lineage. The complete genome sequences, originally determined using 454 pyrosequencing technology, which is known to generate sequencing errors, were re-determined using Illumina MiSeq and MinION (Oxford Nanopore Technologies) technologies and a hybrid assembly generated using Unicycler. Comparison of the resulting new high-quality genomes to the earlier 454-sequenced versions identified a large number of nucleotide differences affecting protein coding sequence (CDS) features, and allowed the sequences of the long and highly repetitive and genes to be properly resolved for the first time in ACICU. Comparisons of the annotations of the original and revised genomes revealed a large number of differences in the protein CDS features, underlining the impact of sequence errors on protein sequence predictions and core gene determination. On average, 400 predicted CDSs were longer or shorter in the revised genomes and about 200 CDS features were no longer present.
全球克隆 1 株 AB307-0294 于 1994 年在美国被分离,全球克隆 2 株(GC2)ACICU 于 2005 年在意大利被分离,它们是首批被完全测序的分离株之一。AB307-0294 对大多数抗生素敏感,已被用于许多遗传研究,而 ACICU 属于罕见的 GC2 谱系。最初使用已知会产生测序错误的 454 焦磷酸测序技术确定的完整基因组序列,使用 Illumina MiSeq 和 MinION(Oxford Nanopore Technologies)技术以及使用 Unicycler 生成的混合组装进行了重新确定。对由此产生的新高质量基因组与早期 454 测序版本的比较确定了大量影响蛋白质编码序列(CDS)特征的核苷酸差异,并首次能够正确解析 ACICU 中长且高度重复的 和 基因的序列。对原始和修订基因组注释的比较揭示了蛋白质 CDS 特征的大量差异,强调了序列错误对蛋白质序列预测和核心基因确定的影响。平均而言,修订基因组中约有 400 个预测 CDS 变长或变短,约有 200 个 CDS 特征不再存在。