• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CEGMA:一种用于准确注释真核生物基因组中核心基因的流程。

CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes.

作者信息

Parra Genis, Bradnam Keith, Korf Ian

机构信息

UC Davis Genome Center, University of California Davis, Davis, CA 95616, USA.

出版信息

Bioinformatics. 2007 May 1;23(9):1061-7. doi: 10.1093/bioinformatics/btm071. Epub 2007 Mar 1.

DOI:10.1093/bioinformatics/btm071
PMID:17332020
Abstract

MOTIVATION

The numbers of finished and ongoing genome projects are increasing at a rapid rate, and providing the catalog of genes for these new genomes is a key challenge. Obtaining a set of well-characterized genes is a basic requirement in the initial steps of any genome annotation process. An accurate set of genes is needed in order to learn about species-specific properties, to train gene-finding programs, and to validate automatic predictions. Unfortunately, many new genome projects lack comprehensive experimental data to derive a reliable initial set of genes.

RESULTS

In this study, we report a computational method, CEGMA (Core Eukaryotic Genes Mapping Approach), for building a highly reliable set of gene annotations in the absence of experimental data. We define a set of conserved protein families that occur in a wide range of eukaryotes, and present a mapping procedure that accurately identifies their exon-intron structures in a novel genomic sequence. CEGMA includes the use of profile-hidden Markov models to ensure the reliability of the gene structures. Our procedure allows one to build an initial set of reliable gene annotations in potentially any eukaryotic genome, even those in draft stages.

AVAILABILITY

Software and data sets are available online at http://korflab.ucdavis.edu/Datasets.

摘要

动机

已完成和正在进行的基因组计划数量正在迅速增加,为这些新基因组提供基因目录是一项关键挑战。获得一组特征明确的基因是任何基因组注释过程初始步骤的基本要求。需要一组准确的基因来了解物种特异性特征、训练基因发现程序以及验证自动预测结果。不幸的是,许多新的基因组计划缺乏全面的实验数据来推导可靠的初始基因集。

结果

在本研究中,我们报告了一种计算方法CEGMA(核心真核基因定位方法),用于在缺乏实验数据的情况下构建高度可靠的基因注释集。我们定义了一组存在于多种真核生物中的保守蛋白家族,并提出了一种定位程序,可在新的基因组序列中准确识别它们的外显子-内含子结构。CEGMA包括使用轮廓隐马尔可夫模型来确保基因结构的可靠性。我们的程序允许在潜在的任何真核基因组中构建一组可靠的初始基因注释,甚至是处于草图阶段的基因组。

可用性

软件和数据集可在http://korflab.ucdavis.edu/Datasets在线获取。

相似文献

1
CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes.CEGMA:一种用于准确注释真核生物基因组中核心基因的流程。
Bioinformatics. 2007 May 1;23(9):1061-7. doi: 10.1093/bioinformatics/btm071. Epub 2007 Mar 1.
2
Homology search for genes.基因的同源性搜索
Bioinformatics. 2007 Jul 1;23(13):i97-103. doi: 10.1093/bioinformatics/btm225.
3
PhyloPat: phylogenetic pattern analysis of eukaryotic genes.PhyloPat:真核基因的系统发育模式分析
BMC Bioinformatics. 2006 Sep 1;7:398. doi: 10.1186/1471-2105-7-398.
4
Identifying clusters of functionally related genes in genomes.识别基因组中功能相关基因的簇。
Bioinformatics. 2007 May 1;23(9):1053-60. doi: 10.1093/bioinformatics/btl673. Epub 2007 Jan 19.
5
Advances in the Exon-Intron Database (EID).外显子-内含子数据库(EID)的进展。
Brief Bioinform. 2006 Jun;7(2):178-85. doi: 10.1093/bib/bbl003. Epub 2006 Mar 9.
6
SITEBLAST--rapid and sensitive local alignment of genomic sequences employing motif anchors.SITEBLAST——利用基序锚点对基因组序列进行快速且灵敏的局部比对
Bioinformatics. 2005 May 1;21(9):2093-4. doi: 10.1093/bioinformatics/bti224. Epub 2004 Dec 14.
7
HomologMiner: looking for homologous genomic groups in whole genomes.同源基因挖掘器:在全基因组中寻找同源基因组群。
Bioinformatics. 2007 Apr 15;23(8):917-25. doi: 10.1093/bioinformatics/btm048. Epub 2007 Feb 18.
8
OrthologID: automation of genome-scale ortholog identification within a parsimony framework.直系同源物ID:简约框架内全基因组规模直系同源物鉴定的自动化
Bioinformatics. 2006 Mar 15;22(6):699-707. doi: 10.1093/bioinformatics/btk040. Epub 2006 Jan 12.
9
A precise and scalable method for querying genes in chromosomal banding regions based on cytogenetic annotations.一种基于细胞遗传学注释在染色体带型区域查询基因的精确且可扩展的方法。
Bioinformatics. 2005 Sep 1;21(17):3469-74. doi: 10.1093/bioinformatics/bti566. Epub 2005 Jul 5.
10
WindowMasker: window-based masker for sequenced genomes.窗口掩码器:用于测序基因组的基于窗口的掩码器。
Bioinformatics. 2006 Jan 15;22(2):134-41. doi: 10.1093/bioinformatics/bti774. Epub 2005 Nov 15.

引用本文的文献

1
Chromosome-level genome assembly of the caddisfly Stenopsyche angustata (Insecta: Trichoptera).纹石蛾(昆虫纲:毛翅目)窄体纹石蛾的染色体水平基因组组装
Sci Data. 2025 Sep 1;12(1):1523. doi: 10.1038/s41597-025-05602-8.
2
The near-complete genome assembly of provides insights into its origin, evolution, and the regulation of flavonoid biosynthesis.[具体物种名称]近乎完整的基因组组装为其起源、进化以及类黄酮生物合成的调控提供了见解。
Front Plant Sci. 2025 Aug 11;16:1580779. doi: 10.3389/fpls.2025.1580779. eCollection 2025.
3
Chromosomal-level genome assembly of an allotetraploid oyster.
异源四倍体牡蛎的染色体水平基因组组装
Sci Data. 2025 Aug 26;12(1):1492. doi: 10.1038/s41597-025-05775-2.
4
Whole genome sequencing and annotations of Trametes sanguinea ZHSJ.血红栓菌ZHSJ的全基因组测序及注释
Sci Data. 2025 Aug 21;12(1):1460. doi: 10.1038/s41597-025-05798-9.
5
Achieving Chromosome-Level Genome Assembly of Onychostoma macrolepis with PacBio Sequencing and Hi-C Technologies.利用PacBio测序和Hi-C技术实现大眼卷口鱼染色体水平的基因组组装
Sci Data. 2025 Aug 13;12(1):1410. doi: 10.1038/s41597-025-05610-8.
6
Genome assembly at the chromosome level of Clinopodium barosmum.留兰香染色体水平的基因组组装
Sci Data. 2025 Aug 12;12(1):1406. doi: 10.1038/s41597-025-05784-1.
7
Draft genome sequence and annotation of the enfumafungin producing fungus Hormonema carpetanum ATCC 74360.产恩夫替康的真菌地毯单孢霉ATCC 74360的基因组序列草图及注释
BMC Genom Data. 2025 Aug 8;26(1):55. doi: 10.1186/s12863-025-01348-9.
8
Jasmonate-induced prey response in the carnivorous plant .茉莉酸诱导食肉植物的猎物反应。
bioRxiv. 2025 Jul 25:2025.07.18.665637. doi: 10.1101/2025.07.18.665637.
9
Chromosome-level genome assembly of Ampulex clypecomplana Chen & Li (Hymenoptera: Ampulicidae).陈氏扁足泥蜂(膜翅目:扁足泥蜂科)的染色体水平基因组组装
Sci Data. 2025 Jul 30;12(1):1328. doi: 10.1038/s41597-025-05676-4.
10
PlastidHub: An integrated analysis platform for plastid phylogenomics and comparative genomics.质体中心:一个用于质体系统发育基因组学和比较基因组学的综合分析平台。
Plant Divers. 2025 May 22;47(4):544-560. doi: 10.1016/j.pld.2025.05.005. eCollection 2025 Jul.