Gundogdu Ozan, Bentley Stephen D, Holden Matt T, Parkhill Julian, Dorrell Nick, Wren Brendan W
Pathogen Molecular Department, London School of Hygiene & Tropical Medicine, UK.
BMC Genomics. 2007 Jun 12;8:162. doi: 10.1186/1471-2164-8-162.
Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation.
Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised.
Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes.
空肠弯曲菌是发达国家人类肠胃炎的主要细菌病因。为了增进我们对这种重要人类病原体的了解,空肠弯曲菌NCTC11168基因组于2000年进行了测序并发表。最初的注释是弯曲菌研究中的一个里程碑,但现已过时。我们现在描述使用当前数据库信息、新型工具和原始注释过程中未使用的注释技术对空肠弯曲菌NCTC11168基因组进行的完整重新注释和重新分析。
使用诸如FASTA等序列数据库搜索以及诸如TMHMM等程序进行重新注释以提供额外支持。重新注释还利用了原始注释期间无法获得的其他弯曲菌菌株和物种的序列数据。重新注释伴随着全面的文献搜索,并将其纳入更新后的EMBL文件[EMBL: AL111168]。空肠弯曲菌NCTC11168的重新注释使编码序列总数从1654个减少到1643个,其中90.0%的编码序列在新基序识别和/或相关文献方面有更多信息。重新注释导致18.2%的编码序列产物功能被修订。
对参与重要表面结构生物合成的基因进行了重大更新,如脂寡糖、荚膜以及O-和N-连接糖基化。这种重新注释将成为弯曲菌研究的关键资源,也将为其他细菌基因组的重新注释和重新解读提供一个范例。