• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GeneMark-EP+:在基因和蛋白质空间中进行自我训练的真核基因预测

GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins.

作者信息

Brůna Tomáš, Lomsadze Alexandre, Borodovsky Mark

机构信息

School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.

出版信息

NAR Genom Bioinform. 2020 Jun;2(2):lqaa026. doi: 10.1093/nargab/lqaa026. Epub 2020 May 13.

DOI:10.1093/nargab/lqaa026
PMID:32440658
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7222226/
Abstract

We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient gene finding, GeneMark-ES, with parameters trained in iterative mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.

摘要

我们在创建一种用于真核生物基因组基因预测的快速且准确的算法方面已经迈出了几步。首先,我们引入了一种用于高效基因发现的自动化方法——GeneMark-ES,其参数是在迭代模式下训练得到的。接下来,在GeneMark-ET中,我们提出了一种将无监督训练与通过映射短RNA reads揭示的内含子位置信息相结合的方法。现在我们描述GeneMark-EP,这是一种利用另一种外部信息源——蛋白质数据库的工具,该数据库在测序项目开始之前即可获取。一种新的专门流程ProtHint启动对基因组的大规模蛋白质映射,并提取潜在基因的剪接位点以及翻译起始和终止位点的线索。GeneMark-EP利用这些线索来改进模型参数的估计,并且如果预测基因的坐标与最可靠的线索不一致(-EP+模式),还会调整预测基因的坐标。与GeneMark-ES相比,GeneMark-EP和-EP+的测试表明基因预测准确性有所提高,而GeneMark-EP+显示出比GeneMark-ET更高的准确性。我们观察到,基因预测准确性最显著的提高发生在大型真核生物基因组中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5848/7671314/6ad386617a72/lqaa026fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5848/7671314/fa4240165fe6/lqaa026fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5848/7671314/f3e9f22090f6/lqaa026fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5848/7671314/2f6e0484c6a6/lqaa026fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5848/7671314/1b6e89385be0/lqaa026fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5848/7671314/6ad386617a72/lqaa026fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5848/7671314/fa4240165fe6/lqaa026fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5848/7671314/f3e9f22090f6/lqaa026fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5848/7671314/2f6e0484c6a6/lqaa026fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5848/7671314/1b6e89385be0/lqaa026fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5848/7671314/6ad386617a72/lqaa026fig5.jpg

相似文献

1
GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins.GeneMark-EP+:在基因和蛋白质空间中进行自我训练的真核基因预测
NAR Genom Bioinform. 2020 Jun;2(2):lqaa026. doi: 10.1093/nargab/lqaa026. Epub 2020 May 13.
2
BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.BRAKER1:基于RNA测序的无监督基因组注释,结合GeneMark-ET和AUGUSTUS
Bioinformatics. 2016 Mar 1;32(5):767-9. doi: 10.1093/bioinformatics/btv661. Epub 2015 Nov 11.
3
Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm.将映射的RNA测序读数整合到真核生物基因发现算法的自动训练中。
Nucleic Acids Res. 2014 Sep;42(15):e119. doi: 10.1093/nar/gku557. Epub 2014 Jul 2.
4
Whole-Genome Annotation with BRAKER.使用BRAKER进行全基因组注释。
Methods Mol Biol. 2019;1962:65-95. doi: 10.1007/978-1-4939-9173-0_5.
5
BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.BRAKER2:借助蛋白质数据库,由GeneMark-EP+和AUGUSTUS支持的真核生物基因组自动注释工具。
NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa108. doi: 10.1093/nargab/lqaa108. eCollection 2021 Mar.
6
Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES.使用GeneMark.hmm-E和GeneMark-ES进行真核基因预测。
Curr Protoc Bioinformatics. 2011 Sep;Chapter 4:4.6.1-4.6.10. doi: 10.1002/0471250953.bi0406s35.
7
A new gene finding tool GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.一种新的基因发现工具GeneMark-ETP显著提高了大型真核生物基因组自动注释的准确性。
bioRxiv. 2024 Apr 17:2023.01.13.524024. doi: 10.1101/2023.01.13.524024.
8
GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.GeneMark-ETP 显著提高了大型真核基因组自动注释的准确性。
Genome Res. 2024 Jun 25;34(5):757-768. doi: 10.1101/gr.278373.123.
9
Probabilistic methods of identifying genes in prokaryotic genomes: connections to the HMM theory.原核生物基因组中基因识别的概率方法:与隐马尔可夫模型理论的联系。
Brief Bioinform. 2004 Jun;5(2):118-30. doi: 10.1093/bib/5.2.118.
10
Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training.使用具有无监督训练的从头算算法在新型真菌基因组中进行基因预测。
Genome Res. 2008 Dec;18(12):1979-90. doi: 10.1101/gr.081612.108. Epub 2008 Aug 29.

引用本文的文献

1
A High-Quality Chromosome-Level Genome Assembly and Comparative Analyses Provide Insights into the Adaptation of (Fabricius, 1794) (Diptera: Calliphoridae).高质量的染色体水平基因组组装及比较分析为红头丽蝇(法布里丘斯,1794年)(双翅目:丽蝇科)的适应性研究提供了见解。
Biology (Basel). 2025 Jul 22;14(8):913. doi: 10.3390/biology14080913.
2
Chromosome-level assembly of cv. 'Tokiwa' as a reference genome of Japanese cucumber.栽培品种‘常盘’的染色体水平组装,作为日本黄瓜的参考基因组。
Breed Sci. 2025 Apr;75(2):85-92. doi: 10.1270/jsbbs.24066. Epub 2025 Mar 27.
3
A chromosomal-level genome assembly of Omiodes indicata Fabricius (Lepidoptera: Crambidae).

本文引用的文献

1
VARUS: sampling complementary RNA reads from the sequence read archive.VARUS:从序列读取档案中采样互补 RNA 读取。
BMC Bioinformatics. 2019 Nov 8;20(1):558. doi: 10.1186/s12859-019-3182-x.
2
EuGene: An Automated Integrative Gene Finder for Eukaryotes and Prokaryotes.EuGene:一款用于真核生物和原核生物的自动化综合基因查找工具。
Methods Mol Biol. 2019;1962:97-120. doi: 10.1007/978-1-4939-9173-0_6.
3
Predicting Genes in Single Genomes with AUGUSTUS.使用AUGUSTUS预测单基因组中的基因。
印度谷螟(鳞翅目:草螟科)的染色体水平基因组组装
Sci Data. 2025 Aug 29;12(1):1514. doi: 10.1038/s41597-025-05644-y.
4
A chromosome-level genome assembly of Sarcophaga princeps Wiedemann, 1830 (Diptera: Sarcophagidae).1830年维德曼氏肉蝇(双翅目:麻蝇科)的染色体水平基因组组装
Sci Data. 2025 Aug 15;12(1):1433. doi: 10.1038/s41597-025-05785-0.
5
Genomes of nitrogen-fixing eukaryotes reveal an alternate path for organellogenesis.固氮真核生物的基因组揭示了一条细胞器发生的替代途径。
Proc Natl Acad Sci U S A. 2025 Aug 19;122(33):e2507237122. doi: 10.1073/pnas.2507237122. Epub 2025 Aug 12.
6
Chromosome-level genome assembly of the autotetraploid yellow pitaya provides novel insights into evolution of trait patterning in pitaya species with different ploidy.同源四倍体黄火龙果的染色体水平基因组组装为不同倍性火龙果物种的性状模式进化提供了新见解。
Genome Biol. 2025 Aug 6;26(1):234. doi: 10.1186/s13059-025-03695-3.
7
A chromosome-level genome assembly of Guimi No. 2 (Actinidia chinensis).‘贵蜜2号’(中华猕猴桃)的染色体水平基因组组装
Sci Data. 2025 Jul 31;12(1):1334. doi: 10.1038/s41597-025-05593-6.
8
Starship giant transposons dominate plastic genomic regions in a fungal plant pathogen and drive virulence evolution.星舰巨型转座子在一种真菌植物病原体中主导可塑性基因组区域并推动毒力进化。
Nat Commun. 2025 Jul 24;16(1):6806. doi: 10.1038/s41467-025-61986-6.
9
Chromosome-level genome assembly of the large carpenter bee Xylocopa dejeanii Lepeletier, 1841 (Hymenoptera: Apidae).大木蜂(Xylocopa dejeanii Lepeletier,1841年)(膜翅目:蜜蜂科)的染色体水平基因组组装
Sci Data. 2025 Jul 23;12(1):1280. doi: 10.1038/s41597-025-05641-1.
10
A chromosomal-level genome assembly of Odontolabis cuvera Hope, 1842 (Coleoptera: Lucanidae).弯齿锯锹甲(Odontolabis cuvera Hope,1842)的染色体水平基因组组装(鞘翅目:锹甲科)
Sci Data. 2025 Jul 17;12(1):1258. doi: 10.1038/s41597-025-05613-5.
Curr Protoc Bioinformatics. 2019 Mar;65(1):e57. doi: 10.1002/cpbi.57. Epub 2018 Nov 22.
4
OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs.OrthoDB v10:从动物、植物、真菌、原生生物、细菌和病毒基因组中采样,以进行同源基因的进化和功能注释。
Nucleic Acids Res. 2019 Jan 8;47(D1):D807-D811. doi: 10.1093/nar/gky1053.
5
APPRIS 2017: principal isoforms for multiple gene sets.APPRIS 2017:多个基因集的主要同工型。
Nucleic Acids Res. 2018 Jan 4;46(D1):D213-D217. doi: 10.1093/nar/gkx997.
6
CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.CDD/SPARCLE:通过亚家族结构域架构对蛋白质进行功能分类
Nucleic Acids Res. 2017 Jan 4;45(D1):D200-D203. doi: 10.1093/nar/gkw1129. Epub 2016 Nov 29.
7
The Ensembl gene annotation system.Ensembl基因注释系统。
Database (Oxford). 2016 Jun 23;2016. doi: 10.1093/database/baw093. Print 2016.
8
Using intron position conservation for homology-based gene prediction.利用内含子位置保守性进行基于同源性的基因预测。
Nucleic Acids Res. 2016 May 19;44(9):e89. doi: 10.1093/nar/gkw092. Epub 2016 Feb 17.
9
BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.BRAKER1:基于RNA测序的无监督基因组注释,结合GeneMark-ET和AUGUSTUS
Bioinformatics. 2016 Mar 1;32(5):767-9. doi: 10.1093/bioinformatics/btv661. Epub 2015 Nov 11.
10
Fast and sensitive protein alignment using DIAMOND.使用 DIAMOND 进行快速灵敏的蛋白质比对。
Nat Methods. 2015 Jan;12(1):59-60. doi: 10.1038/nmeth.3176. Epub 2014 Nov 17.