Brůna Tomáš, Hoff Katharina J, Lomsadze Alexandre, Stanke Mario, Borodovsky Mark
School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Institute of Mathematics and Computer Science, University of Greifswald, 17489 Greifswald, Germany.
NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa108. doi: 10.1093/nargab/lqaa108. eCollection 2021 Mar.
The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards of annotation achieved through a tremendous investment of human curation efforts. Still, the correctness of all alternative isoforms, even in the best-annotated genomes, could be a good subject for further investigation. The new BRAKER2 pipeline generates and integrates external protein support into the iterative process of training and gene prediction by GeneMark-EP+ and AUGUSTUS. BRAKER2 continues the line started by BRAKER1 where self-training GeneMark-ET and AUGUSTUS made gene predictions supported by transcriptomic data. Among the challenges addressed by the new pipeline was a generation of reliable hints to protein-coding exon boundaries from likely homologous but evolutionarily distant proteins. In comparison with other pipelines for eukaryotic genome annotation, BRAKER2 is fully automatic. It is favorably compared under equal conditions with other pipelines, e.g. MAKER2, in terms of accuracy and performance. Development of BRAKER2 should facilitate solving the task of harmonization of annotation of protein-coding genes in genomes of different eukaryotic species. However, we fully understand that several more innovations are needed in transcriptomic and proteomic technologies as well as in algorithmic development to reach the goal of highly accurate annotation of eukaryotic genomes.
真核生物基因组注释任务仍然具有挑战性。只有少数基因组能够作为通过大量人力注释工作所达成的注释标准。即便如此,所有可变剪接异构体的正确性,即使是在注释最为完善的基因组中,仍可能是值得进一步研究的课题。新的BRAKER2流程通过GeneMark-EP+和AUGUSTUS在训练和基因预测的迭代过程中生成并整合外部蛋白质支持信息。BRAKER2延续了BRAKER1开创的思路,即利用自我训练的GeneMark-ET和AUGUSTUS进行由转录组数据支持的基因预测。新流程所应对的挑战之一是,从可能同源但进化距离较远的蛋白质中生成蛋白质编码外显子边界的可靠线索。与其他真核生物基因组注释流程相比,BRAKER2是完全自动化的。在同等条件下,它在准确性和性能方面与其他流程(如MAKER2)相比具有优势。BRAKER2的开发应有助于解决不同真核生物物种基因组中蛋白质编码基因注释协调统一的任务。然而,我们完全明白,转录组学和蛋白质组学技术以及算法开发方面还需要更多创新,才能实现真核生物基因组高精度注释的目标。