• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GeneMark-ETP 显著提高了大型真核基因组自动注释的准确性。

GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.

机构信息

School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.

Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.

出版信息

Genome Res. 2024 Jun 25;34(5):757-768. doi: 10.1101/gr.278373.123.

DOI:10.1101/gr.278373.123
PMID:38866548
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11216313/
Abstract

Large-scale genomic initiatives, such as the Earth BioGenome Project, require efficient methods for eukaryotic genome annotation. Here we present an automatic gene finder, GeneMark-ETP, integrating genomic-, transcriptomic-, and protein-derived evidence that has been developed with a focus on large plant and animal genomes. GeneMark-ETP first identifies genomic loci where extrinsic data are sufficient for making gene predictions with "high confidence." The genes situated in the genomic space between the high-confidence genes are predicted in the next stage. The set of high-confidence genes serves as an initial training set for the statistical model. Further on, the model parameters are iteratively updated in the rounds of gene prediction and parameter re-estimation. Upon reaching convergence, GeneMark-ETP makes the final predictions and delivers the whole complement of predicted genes. GeneMark-ETP outperforms gene finders using a single type of extrinsic evidence. Comparisons with gene finders MAKER2 and TSEBRA, those that use both transcript- and protein-derived extrinsic evidence, show that GeneMark-ETP delivers state-of-the-art gene-prediction accuracy, with the margin of outperforming existing approaches increasing in its application to larger and more complex eukaryotic genomes.

摘要

大规模基因组计划,如地球生物基因组计划,需要高效的真核生物基因组注释方法。本文介绍了一种自动基因预测器 GeneMark-ETP,它整合了基于基因组、转录组和蛋白质的证据,专注于大型植物和动物基因组开发。GeneMark-ETP 首先识别出外显子数据足以进行“高可信度”基因预测的基因组区域。在接下来的阶段预测位于高可信度基因之间基因组空间的基因。高可信度基因集作为统计模型的初始训练集。进一步,在基因预测和参数重新估计的轮次中迭代更新模型参数。达到收敛后,GeneMark-ETP 做出最终预测并提供整套预测基因。GeneMark-ETP 的表现优于仅使用单一类型外显子证据的基因预测器。与同时使用转录组和蛋白质衍生外显子证据的基因预测器 MAKER2 和 TSEBRA 的比较表明,GeneMark-ETP 提供了最先进的基因预测准确性,并且随着其在更大和更复杂的真核生物基因组中的应用,其性能优于现有方法的幅度不断增加。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed2f/11216313/40f2002c6e41/757f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed2f/11216313/70d5aef32078/757f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed2f/11216313/87773540d92e/757f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed2f/11216313/95edd47f51fd/757f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed2f/11216313/fa42260f5a34/757f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed2f/11216313/40f2002c6e41/757f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed2f/11216313/70d5aef32078/757f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed2f/11216313/87773540d92e/757f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed2f/11216313/95edd47f51fd/757f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed2f/11216313/fa42260f5a34/757f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed2f/11216313/40f2002c6e41/757f05.jpg

相似文献

1
GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.GeneMark-ETP 显著提高了大型真核基因组自动注释的准确性。
Genome Res. 2024 Jun 25;34(5):757-768. doi: 10.1101/gr.278373.123.
2
A new gene finding tool GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.一种新的基因发现工具GeneMark-ETP显著提高了大型真核生物基因组自动注释的准确性。
bioRxiv. 2024 Apr 17:2023.01.13.524024. doi: 10.1101/2023.01.13.524024.
3
BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA.BRAKER3:利用 RNA-seq 和蛋白质证据,通过 GeneMark-ETP、AUGUSTUS 和 TSEBRA 进行全自动基因组注释。
Genome Res. 2024 Jun 25;34(5):769-777. doi: 10.1101/gr.278090.123.
4
BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA.BRAKER3:使用RNA测序和蛋白质证据以及GeneMark-ETP、AUGUSTUS和TSEBRA进行全自动基因组注释。
bioRxiv. 2024 Feb 29:2023.06.10.544449. doi: 10.1101/2023.06.10.544449.
5
Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES.使用GeneMark.hmm-E和GeneMark-ES进行真核基因预测。
Curr Protoc Bioinformatics. 2011 Sep;Chapter 4:4.6.1-4.6.10. doi: 10.1002/0471250953.bi0406s35.
6
Whole-Genome Annotation with BRAKER.使用BRAKER进行全基因组注释。
Methods Mol Biol. 2019;1962:65-95. doi: 10.1007/978-1-4939-9173-0_5.
7
BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.BRAKER1:基于RNA测序的无监督基因组注释,结合GeneMark-ET和AUGUSTUS
Bioinformatics. 2016 Mar 1;32(5):767-9. doi: 10.1093/bioinformatics/btv661. Epub 2015 Nov 11.
8
BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.BRAKER2:借助蛋白质数据库,由GeneMark-EP+和AUGUSTUS支持的真核生物基因组自动注释工具。
NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa108. doi: 10.1093/nargab/lqaa108. eCollection 2021 Mar.
9
GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins.GeneMark-EP+:在基因和蛋白质空间中进行自我训练的真核基因预测
NAR Genom Bioinform. 2020 Jun;2(2):lqaa026. doi: 10.1093/nargab/lqaa026. Epub 2020 May 13.
10
GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses.基因标记:用于在原核生物、真核生物和病毒中寻找基因的网络软件。
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W451-4. doi: 10.1093/nar/gki487.

引用本文的文献

1
EASYstrata: an all-in-one workflow for genome annotation and genomic divergence analysis.EASYstrata:用于基因组注释和基因组差异分析的一体化工作流程。
NAR Genom Bioinform. 2025 Aug 27;7(3):lqaf110. doi: 10.1093/nargab/lqaf110. eCollection 2025 Sep.
2
Navigating Eukaryotic Genome Annotation Pipelines: A Route Map to Using BRAKER, Galba, and TSEBRA.探索真核生物基因组注释流程:使用BRAKER、Galba和TSEBRA的路线图
Methods Mol Biol. 2025;2935:67-107. doi: 10.1007/978-1-0716-4583-3_4.
3
Genomes of nitrogen-fixing eukaryotes reveal an alternate path for organellogenesis.

本文引用的文献

1
BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA.BRAKER3:利用 RNA-seq 和蛋白质证据,通过 GeneMark-ETP、AUGUSTUS 和 TSEBRA 进行全自动基因组注释。
Genome Res. 2024 Jun 25;34(5):769-777. doi: 10.1101/gr.278090.123.
2
compleasm: a faster and more accurate reimplementation of BUSCO.compleasm:更快更准确的 BUSCO 重实现。
Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad595.
3
The Earth BioGenome Project 2020: Starting the clock.地球生物基因组计划2020:开启计时。
固氮真核生物的基因组揭示了一条细胞器发生的替代途径。
Proc Natl Acad Sci U S A. 2025 Aug 19;122(33):e2507237122. doi: 10.1073/pnas.2507237122. Epub 2025 Aug 12.
4
Insights into the genetic basis of bilateral head asymmetry in a scale-eating cichlid fish.对一种食鳞丽鱼双边头部不对称的遗传基础的见解。
Sci Adv. 2025 Aug;11(31):eadw4406. doi: 10.1126/sciadv.adw4406. Epub 2025 Jul 30.
5
Chromosome-level genome assembly of Sinocyclocheilus jii based on PacBio HiFi and Hi-C sequencing.基于PacBio HiFi和Hi-C测序的吉氏金线鲃染色体水平基因组组装
Sci Data. 2025 Jul 26;12(1):1303. doi: 10.1038/s41597-025-05663-9.
6
Chromosome-level genome assembly of an Arctic fish species pale eelpout (Lycodes pallidus).一种北极鱼类——苍白长绵鳚(Lycodes pallidus)的染色体水平基因组组装
Sci Data. 2025 Jul 10;12(1):1187. doi: 10.1038/s41597-025-05385-y.
7
Chromosome-level genome of Zoysia sinica in the intertidal zone reveals genomic insights into waterlogging stress adaptation.潮间带中华结缕草的染色体水平基因组揭示了对涝渍胁迫适应的基因组见解。
Plant Genome. 2025 Sep;18(3):e70070. doi: 10.1002/tpg2.70070.
8
The first chromosome-level genome of the lappet moth Trabala vishnou (Lepidoptera: Lasiocampidae).茶斑蛾(Trabala vishnou)(鳞翅目:枯叶蛾科)的首个染色体水平基因组。
Sci Data. 2025 Jul 5;12(1):1154. doi: 10.1038/s41597-025-05456-0.
9
Nuclear genome assembly of Leucinodes orbonalis (Lepidoptera: Crambidae) collected from the Philippines.从菲律宾采集的棉铃虫(鳞翅目:草螟科)的核基因组组装
J Insect Sci. 2025 May 9;25(3). doi: 10.1093/jisesa/ieaf066.
10
Annotation matters: the effect of structural gene annotation on orthology inference.注释很重要:结构基因注释对直系同源推断的影响。
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf365.
Proc Natl Acad Sci U S A. 2022 Jan 25;119(4). doi: 10.1073/pnas.2115635118.
4
TSEBRA: transcript selector for BRAKER.TSEBRA:BRAKER 的转录物选择器。
BMC Bioinformatics. 2021 Nov 25;22(1):566. doi: 10.1186/s12859-021-04482-0.
5
BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes.BUSCO 更新:用于真核生物、原核生物和病毒基因组评分的新颖且简化的工作流程以及更广泛和更深的系统发育覆盖范围。
Mol Biol Evol. 2021 Sep 27;38(10):4647-4654. doi: 10.1093/molbev/msab199.
6
FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences.FINDER:一个自动化软件包,用于从 RNA-Seq 数据和相关蛋白质序列中注释真核基因。
BMC Bioinformatics. 2021 Apr 20;22(1):205. doi: 10.1186/s12859-021-04120-9.
7
BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.BRAKER2:借助蛋白质数据库,由GeneMark-EP+和AUGUSTUS支持的真核生物基因组自动注释工具。
NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa108. doi: 10.1093/nargab/lqaa108. eCollection 2021 Mar.
8
GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins.GeneMark-EP+:在基因和蛋白质空间中进行自我训练的真核基因预测
NAR Genom Bioinform. 2020 Jun;2(2):lqaa026. doi: 10.1093/nargab/lqaa026. Epub 2020 May 13.
9
RepeatModeler2 for automated genomic discovery of transposable element families.RepeatModeler2 用于自动发现转座元件家族的基因组。
Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9451-9457. doi: 10.1073/pnas.1921046117. Epub 2020 Apr 16.
10
A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms.不同真核生物中从头基因预测方法的基准研究。
BMC Genomics. 2020 Apr 9;21(1):293. doi: 10.1186/s12864-020-6707-9.