Genix:一种用于细菌基因组注释的新型在线自动化流程。
Genix: a new online automated pipeline for bacterial genome annotation.
作者信息
Kremer Frederico Schmitt, Eslabão Marcus Redü, Dellagostin Odir Antônio, Pinto Luciano da Silva
机构信息
Centro de Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Rio Grande do Sul, Brazil, 96010-610
Centro de Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Rio Grande do Sul, Brazil, 96010-610.
出版信息
FEMS Microbiol Lett. 2016 Dec;363(23). doi: 10.1093/femsle/fnw263. Epub 2016 Nov 16.
Next-generation sequencing has significantly reduced the cost of genome-sequencing projects, resulting in an expressive increase in the availability of genomic data in public databases. The cheaper and easier is to sequence new genomes, the more accurate the annotation steps have to be to avoid both the loss of information and the accumulation of erroneous features that may affect the accuracy of further analysis. In the case of bacteria genomes, a range of web annotation software has been developed; however, many applications have yet to incorporate the steps required to improve their result, including the removal of false-positive/spurious and a more complete identification of non-coding features. We present Genix, a new web-based bacterial genome annotation pipeline. A comparison of the results generated by Genix for four reference genomes against those generated by other annotation tools indicated that our pipeline is able to provide results that are closer to the reference genome annotation, with a smaller amount of false-positive proteins and missing functional annotated proteins. Additionally, the metrics obtained by Genix were slightly better than those obtained by Prokka, a state-of-art standalone annotation system. Our results indicate that Genix is a useful tool that is able to provide a more refined result, and may be a user-friendly way to obtain high-quality results.
新一代测序技术显著降低了基因组测序项目的成本,使得公共数据库中基因组数据的可获取性大幅增加。新基因组测序越便宜、越容易,注释步骤就必须越精确,以避免信息丢失以及可能影响后续分析准确性的错误特征积累。对于细菌基因组,已经开发了一系列网络注释软件;然而,许多应用尚未纳入改善结果所需的步骤,包括去除假阳性/伪特征以及更全面地识别非编码特征。我们展示了Genix,一种新的基于网络的细菌基因组注释流程。将Genix针对四个参考基因组生成的结果与其他注释工具生成的结果进行比较表明,我们的流程能够提供更接近参考基因组注释的结果,假阳性蛋白质数量更少,功能注释缺失的蛋白质也更少。此外,Genix获得的指标略优于最先进的独立注释系统Prokka。我们的结果表明,Genix是一个有用的工具,能够提供更精细的结果,并且可能是获得高质量结果的用户友好方式。