Lazeroff Matt, Ryder Geordie, Harris Sarah L, Tsourkas Philippos K
Department of Computer Science, and University of Nevada Las Vegas, Las Vegas, Nevada, USA.
Department of Electrical and Computer Engineering, University of Nevada Las Vegas, Las Vegas, Nevada, USA.
Phage (New Rochelle). 2021 Dec 1;2(4):204-213. doi: 10.1089/phage.2020.0044. Epub 2021 Dec 16.
The number of sequenced bacteriophage genomes is growing at an exponential rate. The majority of sequenced bacteriophage genomes are annotated by one or more of several freely available gene identification programs (Glimmer, GeneMark, RAST, Prodigal, etc.). No program has been shown to consistently outperform the others; thus, the choice of which program to use is not obvious. We present the Phage Commander application for rapid identification of bacteriophage genes using multiple gene identification programs. Phage Commander runs a bacteriophage genome sequence through nine gene identification programs (and an additional program for identification of tRNAs) and integrates the results within a single output table. Phage Commander also generates formatted output files for direct export to National Center for Biotechnology Information GenBank or genome visualization programs such as DNA Master. Users can select the threshold for which genes to export (genes identified by at least one program, genes identified by at least two programs, etc.). Phage Commander was benchmarked using eight high-quality bacteriophage genomes whose genes are backed by experimental data. Our results show that the most accurate annotations are obtained by exporting genes identified by at least two or three programs. Many groups opt to manually curate the annotations obtained from gene identification programs, and Phage Commander was designed to facilitate manual curation of genome annotations. Our benchmarking results show that manual curation does indeed produce more accurate annotations than any individual gene identification program. The authors thus recommend manually curating the output of Phage Commander to generate maximally accurate annotations. Phage Commander is currently being used in the corresponding author's bacteriophage genome annotation class and has reduced the labor cost and improved the quality of genome annotations.
已测序的噬菌体基因组数量正以指数速度增长。大多数已测序的噬菌体基因组是通过几个免费的基因识别程序(如Glimmer、GeneMark、RAST、Prodigal等)中的一个或多个进行注释的。没有一个程序被证明始终优于其他程序;因此,使用哪个程序的选择并不明显。我们展示了用于使用多个基因识别程序快速鉴定噬菌体基因的噬菌体指挥官应用程序。噬菌体指挥官通过九个基因识别程序(以及一个用于识别tRNA的附加程序)运行噬菌体基因组序列,并将结果整合到一个输出表中。噬菌体指挥官还生成格式化的输出文件,以便直接导出到国家生物技术信息中心基因库或基因组可视化程序(如DNA Master)。用户可以选择导出哪些基因的阈值(由至少一个程序识别的基因、由至少两个程序识别的基因等)。噬菌体指挥官使用八个高质量的噬菌体基因组进行了基准测试,这些基因组的基因有实验数据支持。我们的结果表明,通过导出由至少两个或三个程序识别的基因可以获得最准确的注释。许多团队选择手动整理从基因识别程序获得的注释,而噬菌体指挥官旨在促进基因组注释的手动整理。我们的基准测试结果表明,手动整理确实比任何单个基因识别程序产生更准确的注释。因此,作者建议手动整理噬菌体指挥官的输出以生成最大程度准确的注释。噬菌体指挥官目前正在相应作者的噬菌体基因组注释课程中使用,并且已经降低了劳动力成本并提高了基因组注释的质量。