Hoff Katharina J, Lange Simone, Lomsadze Alexandre, Borodovsky Mark, Stanke Mario
Ernst Moritz Arndt Universität Greifswald, Institute for Mathematics and Computer Science, 17487 Greifswald, Germany.
Joint Georgia Tech and Emory University Wallace H Coulter Department of Biomedical Engineering, Atlanta, GA 30332, USA and.
Bioinformatics. 2016 Mar 1;32(5):767-9. doi: 10.1093/bioinformatics/btv661. Epub 2015 Nov 11.
Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction.
We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step.
BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/
katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu
Supplementary data are available at Bioinformatics online.
在真核生物基因组中寻找基因极难实现自动化。任务是设计一种工作流程,使用最少的工具集,使其在广泛的物种中达到最先进的性能。GeneMark-ET是一种基因预测工具,它将RNA测序数据纳入无监督训练,随后生成从头开始的基因预测。AUGUSTUS是一种基因发现工具,通常需要监督训练,并在预测步骤中使用来自RNA测序读数的信息。GeneMark-ET和AUGUSTUS的互补优势为设计一种新的自动基因预测组合工具提供了动机。
我们展示了BRAKER1,这是一种基于RNA测序的无监督基因组注释流程,它结合了GeneMark-ET和AUGUSTUS的优势。作为输入,BRAKER1需要一个基因组组装文件和一个bam格式的文件,其中包含RNA测序读数与基因组的剪接比对。首先,GeneMark-ET进行迭代训练并生成初始基因结构。其次,AUGUSTUS使用预测的基因进行训练,然后将RNA测序读数信息整合到最终的基因预测中。在我们的实验中,我们观察到当使用RNA测序作为训练和预测的唯一来源时,BRAKER1比MAKER2更准确。BRAKER1不需要预训练参数或单独的专家准备的训练步骤。
BRAKER1可在http://bioinf.uni-greifswald.de/bioinf/braker/和http://exon.gatech.edu/GeneMark/下载。
katharina.hoff@uni-greifswald.de或borodovsky@gatech.edu
补充数据可在《生物信息学》在线获取。