School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA.
Int J Mol Sci. 2019 Jul 10;20(14):3391. doi: 10.3390/ijms20143391.
Bacteriophages are the most numerous entities on Earth. The number of sequenced phage genomes is approximately 8000 and increasing rapidly. Sequencing of a genome is followed by annotation, where genes, start codons, and functions are putatively identified. The mainstays of phage genome annotation are auto-annotation programs such as Glimmer and GeneMark. Due to the relatively small size of phage genomes, many groups choose to manually curate auto-annotation results to increase accuracy. An additional benefit of manual curation of auto-annotated phage genomes is that the process is amenable to be performed by students, and has been shown to improve student recruitment to the sciences. However, despite its greater accuracy and pedagogical value, manual curation suffers from high labor cost, lack of standardization and a degree of subjectivity in decision making, and susceptibility to mistakes. Here, we present a method developed in our lab that is designed to produce accurate annotations while reducing subjectivity and providing a degree of standardization in decision-making. We show that our method produces genome annotations more accurate than auto-annotation programs while retaining the pedagogical benefits of manual genome curation.
噬菌体是地球上数量最多的生物实体。已测序的噬菌体基因组数量约为 8000 个,并在迅速增加。测序完成后需要对基因组进行注释,其中包括对基因、起始密码子和功能进行推测性识别。噬菌体基因组注释的主要依据是 Glimmer 和 GeneMark 等自动注释程序。由于噬菌体基因组相对较小,许多研究小组选择手动编辑自动注释结果以提高准确性。对自动注释的噬菌体基因组进行手动编辑的另一个好处是,该过程适合学生完成,并且已被证明可以提高学生对科学的兴趣。然而,尽管手动编辑具有更高的准确性和教学价值,但它存在劳动成本高、缺乏标准化以及决策的主观性等问题,并且容易出错。在这里,我们展示了我们实验室开发的一种方法,该方法旨在提高准确性,同时减少主观性并提供决策的标准化程度。我们的方法生成的基因组注释比自动注释程序更准确,同时保留了手动基因组编辑的教学优势。