BRAKER1：基于RNA测序的无监督基因组注释，结合GeneMark-ET和AUGUSTUS

BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

作者信息

Hoff Katharina J, Lange Simone, Lomsadze Alexandre, Borodovsky Mark, Stanke Mario

机构信息

Ernst Moritz Arndt Universität Greifswald, Institute for Mathematics and Computer Science, 17487 Greifswald, Germany.

Joint Georgia Tech and Emory University Wallace H Coulter Department of Biomedical Engineering, Atlanta, GA 30332, USA and.

出版信息

Bioinformatics. 2016 Mar 1;32(5):767-9. doi: 10.1093/bioinformatics/btv661. Epub 2015 Nov 11.

DOI:10.1093/bioinformatics/btv661

PMID:26559507

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6078167/

Abstract

MOTIVATION

Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction.

RESULTS

We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step.

AVAILABILITY AND IMPLEMENTATION

BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/

CONTACT

katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在真核生物基因组中寻找基因极难实现自动化。任务是设计一种工作流程，使用最少的工具集，使其在广泛的物种中达到最先进的性能。GeneMark-ET是一种基因预测工具，它将RNA测序数据纳入无监督训练，随后生成从头开始的基因预测。AUGUSTUS是一种基因发现工具，通常需要监督训练，并在预测步骤中使用来自RNA测序读数的信息。GeneMark-ET和AUGUSTUS的互补优势为设计一种新的自动基因预测组合工具提供了动机。

结果

我们展示了BRAKER1，这是一种基于RNA测序的无监督基因组注释流程，它结合了GeneMark-ET和AUGUSTUS的优势。作为输入，BRAKER1需要一个基因组组装文件和一个bam格式的文件，其中包含RNA测序读数与基因组的剪接比对。首先，GeneMark-ET进行迭代训练并生成初始基因结构。其次，AUGUSTUS使用预测的基因进行训练，然后将RNA测序读数信息整合到最终的基因预测中。在我们的实验中，我们观察到当使用RNA测序作为训练和预测的唯一来源时，BRAKER1比MAKER2更准确。BRAKER1不需要预训练参数或单独的专家准备的训练步骤。

可用性和实现

BRAKER1可在http://bioinf.uni-greifswald.de/bioinf/braker/和http://exon.gatech.edu/GeneMark/下载。

联系方式

katharina.hoff@uni-greifswald.de或borodovsky@gatech.edu

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.BRAKER1：基于RNA测序的无监督基因组注释，结合GeneMark-ET和AUGUSTUS

Bioinformatics. 2016 Mar 1;32(5):767-9. doi: 10.1093/bioinformatics/btv661. Epub 2015 Nov 11.

Whole-Genome Annotation with BRAKER.使用BRAKER进行全基因组注释。

Methods Mol Biol. 2019;1962:65-95. doi: 10.1007/978-1-4939-9173-0_5.

BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.BRAKER2：借助蛋白质数据库，由GeneMark-EP+和AUGUSTUS支持的真核生物基因组自动注释工具。

NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa108. doi: 10.1093/nargab/lqaa108. eCollection 2021 Mar.

BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA.BRAKER3：利用 RNA-seq 和蛋白质证据，通过 GeneMark-ETP、AUGUSTUS 和 TSEBRA 进行全自动基因组注释。

Genome Res. 2024 Jun 25;34(5):769-777. doi: 10.1101/gr.278090.123.

BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA.BRAKER3：使用RNA测序和蛋白质证据以及GeneMark-ETP、AUGUSTUS和TSEBRA进行全自动基因组注释。

bioRxiv. 2024 Feb 29:2023.06.10.544449. doi: 10.1101/2023.06.10.544449.

TSEBRA: transcript selector for BRAKER.TSEBRA：BRAKER 的转录物选择器。

BMC Bioinformatics. 2021 Nov 25;22(1):566. doi: 10.1186/s12859-021-04482-0.

Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm.将映射的RNA测序读数整合到真核生物基因发现算法的自动训练中。

Nucleic Acids Res. 2014 Sep;42(15):e119. doi: 10.1093/nar/gku557. Epub 2014 Jul 2.

WebAUGUSTUS--a web service for training AUGUSTUS and predicting genes in eukaryotes.WebAUGUSTUS--一个用于训练 AUGUSTUS 和预测真核生物基因的网络服务。

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W123-8. doi: 10.1093/nar/gkt418. Epub 2013 May 21.

GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins.GeneMark-EP+：在基因和蛋白质空间中进行自我训练的真核基因预测

NAR Genom Bioinform. 2020 Jun;2(2):lqaa026. doi: 10.1093/nargab/lqaa026. Epub 2020 May 13.

FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences.FINDER：一个自动化软件包，用于从 RNA-Seq 数据和相关蛋白质序列中注释真核基因。

BMC Bioinformatics. 2021 Apr 20;22(1):205. doi: 10.1186/s12859-021-04120-9.

引用本文的文献

Chromosome-Level Genome Announcement of the Monokaryotic Strain PC80.单核菌株PC80的染色体水平基因组公布

J Fungi (Basel). 2025 Jul 29;11(8):563. doi: 10.3390/jof11080563.

Origin and evolutionary trajectories of brown algal sex chromosomes.褐藻性染色体的起源与进化轨迹。

Nat Ecol Evol. 2025 Aug 25. doi: 10.1038/s41559-025-02838-w.

Better together: Subgenomes for allotetraploid potato wild relative Solanum acaule Bitt. reveal origins in Petota Clade 3 and 4.携手共进：异源四倍体马铃薯野生近缘种智利茄的亚基因组揭示其起源于马铃薯进化分支3和4。

Plant Genome. 2025 Sep;18(3):e70095. doi: 10.1002/tpg2.70095.

Evolutionary Genomics of Gene Families: A Case Study of Insect Gustatory Receptors.基因家族的进化基因组学：以昆虫味觉受体为例的研究

Methods Mol Biol. 2025;2935:179-209. doi: 10.1007/978-1-0716-4583-3_8.

Navigating Eukaryotic Genome Annotation Pipelines: A Route Map to Using BRAKER, Galba, and TSEBRA.探索真核生物基因组注释流程：使用BRAKER、Galba和TSEBRA的路线图

Methods Mol Biol. 2025;2935:67-107. doi: 10.1007/978-1-0716-4583-3_4.

A chromosome-level genome assembly of Sarcophaga princeps Wiedemann, 1830 (Diptera: Sarcophagidae).1830年维德曼氏肉蝇（双翅目：麻蝇科）的染色体水平基因组组装

Sci Data. 2025 Aug 15;12(1):1433. doi: 10.1038/s41597-025-05785-0.

Chromosome-level genome assembly of the Durum wheat cultivar Langdon.硬粒小麦品种兰登的染色体水平基因组组装

Sci Data. 2025 Aug 6;12(1):1372. doi: 10.1038/s41597-025-05724-z.

Chromosome level de Novo hybrid assembly of Asian honeybee, Apis cerana Koreana.亚洲蜜蜂（Apis cerana Koreana）的染色体水平从头杂交组装

Sci Rep. 2025 Jul 24;15(1):26912. doi: 10.1038/s41598-025-12338-3.

De novo genome assembly, annotation, and characterization of chemosensory genes in the camel ked (Hippobosca camelina).骆驼蜱（Hippobosca camelina）化学感应基因的从头基因组组装、注释及特征分析

BMC Genomics. 2025 Jul 16;26(1):668. doi: 10.1186/s12864-025-11833-1.

Sexual Antagonism and Sex Determination in Three Syngnathid Species Alongside a Male Pregnancy Gradient.沿雄性怀孕梯度的三种海龙科物种中的性拮抗与性别决定

Genome Biol Evol. 2025 Jul 3;17(7). doi: 10.1093/gbe/evaf103.

本文引用的文献

Current methods for automated annotation of protein-coding genes.蛋白质编码基因自动注释的当前方法。

Curr Opin Insect Sci. 2015 Feb;7:8-14. doi: 10.1016/j.cois.2015.02.008. Epub 2015 Mar 7.

CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts.CodingQuarry：利用RNA测序转录本对真菌基因组进行高精度隐马尔可夫模型基因预测。

BMC Genomics. 2015 Mar 11;16(1):170. doi: 10.1186/s12864-015-1344-4.

Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm.将映射的RNA测序读数整合到真核生物基因发现算法的自动训练中。

Nucleic Acids Res. 2014 Sep;42(15):e119. doi: 10.1093/nar/gku557. Epub 2014 Jul 2.

SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models.雪鸮：通过使用RNA测序和同源性信息在从头预测模型中进行选择来准确预测真菌基因。

BMC Bioinformatics. 2014 Jul 1;15:229. doi: 10.1186/1471-2105-15-229.

Assessment of transcript reconstruction methods for RNA-seq.RNA-seq 转录本重构方法评估。

Nat Methods. 2013 Dec;10(12):1177-84. doi: 10.1038/nmeth.2714. Epub 2013 Nov 3.

MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.MAKER2：用于第二代基因组项目的注释流水线和基因组数据库管理工具。

BMC Bioinformatics. 2011 Dec 22;12:491. doi: 10.1186/1471-2105-12-491.

Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training.使用具有无监督训练的从头算算法在新型真菌基因组中进行基因预测。

Genome Res. 2008 Dec;18(12):1979-90. doi: 10.1101/gr.081612.108. Epub 2008 Aug 29.

Using native and syntenically mapped cDNA alignments to improve de novo gene finding.利用本地和共线性映射的cDNA比对来改进从头基因预测。

Bioinformatics. 2008 Mar 1;24(5):637-44. doi: 10.1093/bioinformatics/btn013. Epub 2008 Jan 24.

Eval: a software package for analysis of genome annotations.Eval：一个用于分析基因组注释的软件包。

BMC Bioinformatics. 2003 Oct 17;4:50. doi: 10.1186/1471-2105-4-50.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验