Max-Planck-Institut für Molekulare Pflanzenphysiologie, Am Mühlenberg 1, D-14476 Potsdam-Golm, Germany.
Glogauer Straße 31, D-10999 Berlin, Germany.
Nucleic Acids Res. 2017 Jul 3;45(W1):W6-W11. doi: 10.1093/nar/gkx391.
We have developed the web application GeSeq (https://chlorobox.mpimp-golm.mpg.de/geseq.html) for the rapid and accurate annotation of organellar genome sequences, in particular chloroplast genomes. In contrast to existing tools, GeSeq combines batch processing with a fully customizable reference sequence selection of organellar genome records from NCBI and/or references uploaded by the user. For the annotation of chloroplast genomes, the application additionally provides an integrated database of manually curated reference sequences. GeSeq identifies genes or other feature-encoding regions by BLAT-based homology searches and additionally, by profile HMM searches for protein and rRNA coding genes and two de novo predictors for tRNA genes. These unique features enable the user to conveniently compare the annotations of different state-of-the-art methods, thus supporting high-quality annotations. The main output of GeSeq is a GenBank file that usually requires only little curation and is instantly visualized by OGDRAW. GeSeq also offers a variety of optional additional outputs that facilitate downstream analyzes, for example comparative genomic or phylogenetic studies.
我们开发了一个名为 GeSeq 的网络应用程序(https://chlorobox.mpimp-golm.mpg.de/geseq.html),用于快速准确地注释细胞器基因组序列,特别是叶绿体基因组。与现有的工具不同,GeSeq 将批处理与完全可定制的参考序列选择相结合,可从 NCBI 和/或用户上传的参考中选择细胞器基因组记录。对于叶绿体基因组的注释,该应用程序还提供了一个手动整理的参考序列集成数据库。GeSeq 通过基于 BLAT 的同源性搜索以及针对蛋白质和 rRNA 编码基因的轮廓 HMM 搜索和两个用于 tRNA 基因的从头预测来识别基因或其他特征编码区域。这些独特的功能使用户可以方便地比较不同最先进方法的注释,从而支持高质量的注释。GeSeq 的主要输出是一个 GenBank 文件,通常只需要很少的整理,并且可以通过 OGDRAW 立即可视化。GeSeq 还提供了各种可选的附加输出,方便下游分析,例如比较基因组或系统发育研究。