• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Galba:使用 miniprot 和 AUGUSTUS 进行基因组注释。

Galba: genome annotation with miniprot and AUGUSTUS.

机构信息

U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.

Department of Data Sciences, Dana-Farber Cancer Institute, Boston, 02215, MA, USA.

出版信息

BMC Bioinformatics. 2023 Aug 31;24(1):327. doi: 10.1186/s12859-023-05449-z.

DOI:10.1186/s12859-023-05449-z
PMID:37653395
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10472564/
Abstract

BACKGROUND

The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes.

RESULTS

Various gene annotation tools have been developed but each has its limitations. Here, we introduce GALBA, a fully automated pipeline that utilizes miniprot, a rapid protein-to-genome aligner, in combination with AUGUSTUS to predict genes with high accuracy. Accuracy results indicate that GALBA is particularly strong in the annotation of large vertebrate genomes. We also present use cases in insects, vertebrates, and a land plant. GALBA is fully open source and available as a docker image for easy execution with Singularity in high-performance computing environments.

CONCLUSIONS

Our pipeline addresses the critical need for accurate gene annotation in newly sequenced genomes, and we believe that GALBA will greatly facilitate genome annotation for diverse organisms.

摘要

背景

地球生物基因组计划迅速增加了可用的真核生物基因组数量,但大多数发布的基因组仍然缺乏对蛋白质编码基因的注释。此外,一些基因组还没有转录组数据。

结果

已经开发了各种基因注释工具,但每种工具都有其局限性。在这里,我们介绍了 GALBA,这是一个完全自动化的流水线,利用快速蛋白质到基因组比对器 miniprot 与 AUGUSTUS 相结合,以高精度预测基因。准确性结果表明,GALBA 在注释大型脊椎动物基因组方面特别强大。我们还介绍了昆虫、脊椎动物和陆地植物的应用案例。GALBA 是完全开源的,并作为一个 Docker 镜像提供,以便在高性能计算环境中使用 Singularity 轻松执行。

结论

我们的流水线解决了新测序基因组中准确基因注释的关键需求,我们相信 GALBA 将极大地促进不同生物的基因组注释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/739a/10472564/72002d30c55f/12859_2023_5449_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/739a/10472564/278549bf0deb/12859_2023_5449_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/739a/10472564/bb3742d26c1b/12859_2023_5449_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/739a/10472564/89e7dfd80c51/12859_2023_5449_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/739a/10472564/59c17e69bcaf/12859_2023_5449_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/739a/10472564/72002d30c55f/12859_2023_5449_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/739a/10472564/278549bf0deb/12859_2023_5449_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/739a/10472564/bb3742d26c1b/12859_2023_5449_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/739a/10472564/89e7dfd80c51/12859_2023_5449_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/739a/10472564/59c17e69bcaf/12859_2023_5449_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/739a/10472564/72002d30c55f/12859_2023_5449_Fig5_HTML.jpg

相似文献

1
Galba: genome annotation with miniprot and AUGUSTUS.Galba:使用 miniprot 和 AUGUSTUS 进行基因组注释。
BMC Bioinformatics. 2023 Aug 31;24(1):327. doi: 10.1186/s12859-023-05449-z.
2
GALBA: Genome Annotation with Miniprot and AUGUSTUS.GALBA:使用Miniprot和AUGUSTUS进行基因组注释。
bioRxiv. 2023 Apr 10:2023.04.10.536199. doi: 10.1101/2023.04.10.536199.
3
BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA.BRAKER3:利用 RNA-seq 和蛋白质证据,通过 GeneMark-ETP、AUGUSTUS 和 TSEBRA 进行全自动基因组注释。
Genome Res. 2024 Jun 25;34(5):769-777. doi: 10.1101/gr.278090.123.
4
BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA.BRAKER3:使用RNA测序和蛋白质证据以及GeneMark-ETP、AUGUSTUS和TSEBRA进行全自动基因组注释。
bioRxiv. 2024 Feb 29:2023.06.10.544449. doi: 10.1101/2023.06.10.544449.
5
Multi-Genome Annotation with AUGUSTUS.使用AUGUSTUS进行多基因组注释。
Methods Mol Biol. 2019;1962:139-160. doi: 10.1007/978-1-4939-9173-0_8.
6
7
Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data.Seqping:使用自训练基因模型和转录组数据的植物基因组基因预测流程
BMC Bioinformatics. 2017 Jan 27;18(Suppl 1):1426. doi: 10.1186/s12859-016-1426-6.
8
Comparison of RefSeq protein-coding regions in human and vertebrate genomes.比较人类和脊椎动物基因组中的 RefSeq 编码蛋白区域。
BMC Genomics. 2013 Sep 25;14:654. doi: 10.1186/1471-2164-14-654.
9
Predicting Genes in Single Genomes with AUGUSTUS.使用AUGUSTUS预测单基因组中的基因。
Curr Protoc Bioinformatics. 2019 Mar;65(1):e57. doi: 10.1002/cpbi.57. Epub 2018 Nov 22.
10
Whole-Genome Annotation with BRAKER.使用BRAKER进行全基因组注释。
Methods Mol Biol. 2019;1962:65-95. doi: 10.1007/978-1-4939-9173-0_5.

引用本文的文献

1
Draft genome sequence of strain P18 isolated from cattle in Japan.从日本牛身上分离出的P18菌株的基因组序列草图。
Microbiol Resour Announc. 2025 Sep 11;14(9):e0054425. doi: 10.1128/mra.00544-25. Epub 2025 Jul 31.
2
Chromosome-level genome assembly of the Vermilion Snapper (Rhomboplites aurorubens).红鲷(Rhomboplites aurorubens)的染色体水平基因组组装
Sci Data. 2025 Jul 23;12(1):1281. doi: 10.1038/s41597-025-05573-w.
3
Annotation matters: the effect of structural gene annotation on orthology inference.注释很重要:结构基因注释对直系同源推断的影响。

本文引用的文献

1
Quality assessment of gene repertoire annotations with OMArk.使用OMArk对基因库注释进行质量评估。
Nat Biotechnol. 2025 Jan;43(1):124-133. doi: 10.1038/s41587-024-02147-w. Epub 2024 Feb 21.
2
The structure of the tetraploid sour cherry 'Schattenmorelle' ( L.) genome reveals insights into its segmental allopolyploid nature.四倍体酸樱桃“Schattenmorelle”(L.)基因组结构揭示了其染色体片段异源多倍体性质。
Front Plant Sci. 2023 Dec 1;14:1284478. doi: 10.3389/fpls.2023.1284478. eCollection 2023.
3
Welcome to the big leaves: Best practices for improving genome annotation in non-model plant genomes.
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf365.
4
The gene regulatory mechanisms shaping the heterogeneity of venom production in the Cape coral snake.塑造海角珊瑚蛇毒液产生异质性的基因调控机制。
Genome Biol. 2025 May 19;26(1):130. doi: 10.1186/s13059-025-03602-w.
5
The First Genome Assembly Of The Dogwhelk Nucella lapillus, a Bioindicator Species For The Marine Environment.狗岩螺(Nucella lapillus)的首个基因组组装,狗岩螺是海洋环境的生物指示物种。
Sci Data. 2025 Apr 28;12(1):704. doi: 10.1038/s41597-025-04764-9.
6
Leveraging Synteny to Generate Reference Genomes for Conservation: Assembling the Genomes of Hector's and Māui Dolphins.利用共线性生成用于保护的参考基因组:组装赫氏海豚和毛伊海豚的基因组。
Mol Ecol Resour. 2025 Oct;25(7):e14109. doi: 10.1111/1755-0998.14109. Epub 2025 Apr 4.
7
Near telomere-to-telomere genome assembly of the blackspot tuskfish (Choerodon schoenleinii).黑斑猪齿鱼(Choerodon schoenleinii)近乎端粒到端粒的基因组组装
Sci Data. 2025 Mar 31;12(1):537. doi: 10.1038/s41597-025-04893-1.
8
The first de novo genome assembly and annotation of a green-blooded skink (Prasinohaema aff. flavipes) from a historic museum sample.基于一份历史博物馆样本首次对绿血石龙子(近似黄足棱蜥,Prasinohaema aff. flavipes)进行的从头基因组组装与注释。
J Hered. 2025 Aug 23;116(5):653-662. doi: 10.1093/jhered/esaf014.
9
A chromosome-scale genome assembly of the pioneer plant Stylosanthes angustifolia: insights into genome evolution and drought adaptation.先锋植物狭叶链荚豆的染色体水平基因组组装:对基因组进化和干旱适应性的见解
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giae118.
10
Parallel gene expansions drive rapid dietary adaptation in herbivorous woodrats.平行基因扩张推动草食性林鼠的快速饮食适应。
Science. 2025 Jan 10;387(6730):156-162. doi: 10.1126/science.adp7978. Epub 2025 Jan 9.
欢迎来到大叶植物:改善非模式植物基因组注释的最佳实践。
Appl Plant Sci. 2023 Aug 8;11(4):e11533. doi: 10.1002/aps3.11533. eCollection 2023 Jul-Aug.
4
Protein-to-genome alignment with miniprot.用 Miniprot 进行蛋白质到基因组的比对。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad014.
5
UniProt: the Universal Protein Knowledgebase in 2023.UniProt:2023 年的通用蛋白质知识库。
Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.
6
OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity.OrthoDB v11:在最广泛的生物多样性样本中注释直系同源物。
Nucleic Acids Res. 2023 Jan 6;51(D1):D445-D451. doi: 10.1093/nar/gkac998.
7
Standards recommendations for the Earth BioGenome Project.地球生物基因组计划标准建议。
Proc Natl Acad Sci U S A. 2022 Jan 25;119(4). doi: 10.1073/pnas.2115639118.
8
BUSCO: Assessing Genomic Data Quality and Beyond.BUSCO:评估基因组数据质量及其他。
Curr Protoc. 2021 Dec;1(12):e323. doi: 10.1002/cpz1.323.
9
The Sequence Read Archive: a decade more of explosive growth.序列读取档案:十年的爆炸式增长。
Nucleic Acids Res. 2022 Jan 7;50(D1):D387-D390. doi: 10.1093/nar/gkab1053.
10
TSEBRA: transcript selector for BRAKER.TSEBRA:BRAKER 的转录物选择器。
BMC Bioinformatics. 2021 Nov 25;22(1):566. doi: 10.1186/s12859-021-04482-0.