Suppr超能文献

BRAKER1:基于RNA测序的无监督基因组注释,结合GeneMark-ET和AUGUSTUS

BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

作者信息

Hoff Katharina J, Lange Simone, Lomsadze Alexandre, Borodovsky Mark, Stanke Mario

机构信息

Ernst Moritz Arndt Universität Greifswald, Institute for Mathematics and Computer Science, 17487 Greifswald, Germany.

Joint Georgia Tech and Emory University Wallace H Coulter Department of Biomedical Engineering, Atlanta, GA 30332, USA and.

出版信息

Bioinformatics. 2016 Mar 1;32(5):767-9. doi: 10.1093/bioinformatics/btv661. Epub 2015 Nov 11.

Abstract

MOTIVATION

Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction.

RESULTS

We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step.

AVAILABILITY AND IMPLEMENTATION

BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/

CONTACT

katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在真核生物基因组中寻找基因极难实现自动化。任务是设计一种工作流程,使用最少的工具集,使其在广泛的物种中达到最先进的性能。GeneMark-ET是一种基因预测工具,它将RNA测序数据纳入无监督训练,随后生成从头开始的基因预测。AUGUSTUS是一种基因发现工具,通常需要监督训练,并在预测步骤中使用来自RNA测序读数的信息。GeneMark-ET和AUGUSTUS的互补优势为设计一种新的自动基因预测组合工具提供了动机。

结果

我们展示了BRAKER1,这是一种基于RNA测序的无监督基因组注释流程,它结合了GeneMark-ET和AUGUSTUS的优势。作为输入,BRAKER1需要一个基因组组装文件和一个bam格式的文件,其中包含RNA测序读数与基因组的剪接比对。首先,GeneMark-ET进行迭代训练并生成初始基因结构。其次,AUGUSTUS使用预测的基因进行训练,然后将RNA测序读数信息整合到最终的基因预测中。在我们的实验中,我们观察到当使用RNA测序作为训练和预测的唯一来源时,BRAKER1比MAKER2更准确。BRAKER1不需要预训练参数或单独的专家准备的训练步骤。

可用性和实现

BRAKER1可在http://bioinf.uni-greifswald.de/bioinf/braker/和http://exon.gatech.edu/GeneMark/下载。

联系方式

katharina.hoff@uni-greifswald.deborodovsky@gatech.edu

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

2
Whole-Genome Annotation with BRAKER.使用BRAKER进行全基因组注释。
Methods Mol Biol. 2019;1962:65-95. doi: 10.1007/978-1-4939-9173-0_5.
6
TSEBRA: transcript selector for BRAKER.TSEBRA:BRAKER 的转录物选择器。
BMC Bioinformatics. 2021 Nov 25;22(1):566. doi: 10.1186/s12859-021-04482-0.

引用本文的文献

本文引用的文献

1
Current methods for automated annotation of protein-coding genes.蛋白质编码基因自动注释的当前方法。
Curr Opin Insect Sci. 2015 Feb;7:8-14. doi: 10.1016/j.cois.2015.02.008. Epub 2015 Mar 7.
5
Assessment of transcript reconstruction methods for RNA-seq.RNA-seq 转录本重构方法评估。
Nat Methods. 2013 Dec;10(12):1177-84. doi: 10.1038/nmeth.2714. Epub 2013 Nov 3.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验