• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Exogean:一种用于注释真核生物基因组DNA中蛋白质编码基因的框架。

Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA.

作者信息

Djebali Sarah, Delaplace Franck, Roest Crollius Hugues

机构信息

Dyogen Lab, CNRS UMR8541, Ecole Normale Supérieure, 46 rue d'Ulm, 75005 Paris, France.

出版信息

Genome Biol. 2006;7 Suppl 1(Suppl 1):S7.1-10. doi: 10.1186/gb-2006-7-s1-s7. Epub 2006 Aug 7.

DOI:10.1186/gb-2006-7-s1-s7
PMID:16925841
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1810556/
Abstract

BACKGROUND

Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism.

RESULTS

We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts.

CONCLUSION

We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement.

摘要

背景

在真核生物基因组DNA中准确、自动地识别基因,对于有效利用科学界可获得的大量已组装基因组序列而言,比以往任何时候都更为重要。自动方法一直被认为不如人工专业知识可靠。这在EGASP项目中得到了体现,在该项目中,所有自动方法所依据的参考注释是由人工注释员生成并经过实验验证的。我们假设,通过将人工注释员所使用的规则和决策形式化为一种数学形式,能够在一种自动方法中复制人工注释员的准确性。

结果

我们开发了Exogean,这是一个基于有向无环彩色多重图(DACM)的灵活框架,它可以表示生物对象(例如,mRNA、EST、蛋白质比对、外显子)以及它们之间的关系。根据复制人工注释员所使用规则的规则对图进行分析,以处理信息。因此,作为Exogean输入给出的简单单个起始对象被组合并合成为诸如蛋白质编码转录本之类的复杂对象。

结论

我们在此表明,在EGASP项目的背景下,就每个基因识别至少一个精确编码序列而言,Exogean是目前最能重现人工专家蛋白质编码基因注释的方法。我们讨论了该方法当前的局限性以及几种改进途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c441/1810556/fd56451d6d52/gb-2006-7-s1-s7-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c441/1810556/14eb30c9cd2b/gb-2006-7-s1-s7-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c441/1810556/240459d4901b/gb-2006-7-s1-s7-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c441/1810556/51319822482d/gb-2006-7-s1-s7-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c441/1810556/a1761c817f39/gb-2006-7-s1-s7-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c441/1810556/7e20f637720a/gb-2006-7-s1-s7-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c441/1810556/fd56451d6d52/gb-2006-7-s1-s7-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c441/1810556/14eb30c9cd2b/gb-2006-7-s1-s7-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c441/1810556/240459d4901b/gb-2006-7-s1-s7-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c441/1810556/51319822482d/gb-2006-7-s1-s7-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c441/1810556/a1761c817f39/gb-2006-7-s1-s7-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c441/1810556/7e20f637720a/gb-2006-7-s1-s7-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c441/1810556/fd56451d6d52/gb-2006-7-s1-s7-6.jpg

相似文献

1
Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA.Exogean:一种用于注释真核生物基因组DNA中蛋白质编码基因的框架。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S7.1-10. doi: 10.1186/gb-2006-7-s1-s7. Epub 2006 Aug 7.
2
AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome.EGASP中的AUGUSTUS:利用EST、蛋白质和基因组比对改进人类基因组中的基因预测
Genome Biol. 2006;7 Suppl 1(Suppl 1):S11.1-8. doi: 10.1186/gb-2006-7-s1-s11. Epub 2006 Aug 7.
3
Pairagon+N-SCAN_EST: a model-based gene annotation pipeline.Pairagon+N-SCAN_EST:一种基于模型的基因注释流程。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S5.1-10. doi: 10.1186/gb-2006-7-s1-s5. Epub 2006 Aug 7.
4
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
5
GENCODE: producing a reference annotation for ENCODE.GENCODE:为ENCODE生成参考注释。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S4.1-9. doi: 10.1186/gb-2006-7-s1-s4. Epub 2006 Aug 7.
6
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign:利用氨基酸促进蛋白质编码DNA序列的多重比对。
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
7
EGASP: the human ENCODE Genome Annotation Assessment Project.EGASP:人类ENCODE基因组注释评估项目。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S2.1-31. doi: 10.1186/gb-2006-7-s1-s2. Epub 2006 Aug 7.
8
AceView: a comprehensive cDNA-supported gene and transcripts annotation.AceView:一个由cDNA支持的全面的基因和转录本注释。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S12.1-14. doi: 10.1186/gb-2006-7-s1-s12. Epub 2006 Aug 7.
9
Vertebrate gene finding from multiple-species alignments using a two-level strategy.使用两级策略从多物种比对中寻找脊椎动物基因。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S6.1-12. doi: 10.1186/gb-2006-7-s1-s6. Epub 2006 Aug 7.
10
[Correction of five different types of errors of model REFSEQs appeared in NCBI human gene database only by using two novel human genes C17orf32 and ZNF362].[仅通过使用两个新的人类基因C17orf32和ZNF362校正出现在NCBI人类基因数据库中的五种不同类型的模型REFSEQs错误]
Yi Chuan Xue Bao. 2004 Apr;31(4):325-34.

引用本文的文献

1
ASPic-GeneID: a lightweight pipeline for gene prediction and alternative isoforms detection.ASPic-GeneID:一个用于基因预测和可变剪接异构体检测的轻量级流程。
Biomed Res Int. 2013;2013:502827. doi: 10.1155/2013/502827. Epub 2013 Nov 7.
2
PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text.PIntron:一种通过模式和文本的最大配对来检测因选择性剪接而导致的基因结构的快速方法。
BMC Bioinformatics. 2012 Apr 12;13 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-13-S5-S2.
3
Concerted action of the new Genomic Peptide Finder and AUGUSTUS allows for automated proteogenomic annotation of the Chlamydomonas reinhardtii genome.

本文引用的文献

1
GENCODE: producing a reference annotation for ENCODE.GENCODE:为ENCODE生成参考注释。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S4.1-9. doi: 10.1186/gb-2006-7-s1-s4. Epub 2006 Aug 7.
2
EGASP: the human ENCODE Genome Annotation Assessment Project.EGASP:人类ENCODE基因组注释评估项目。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S2.1-31. doi: 10.1186/gb-2006-7-s1-s2. Epub 2006 Aug 7.
3
Genome annotation past, present, and future: how to define an ORF at each locus.基因组注释的过去、现在与未来:如何在每个基因座定义一个开放阅读框。
新型基因组肽发现器与 AUGUSTUS 的协同作用可实现莱茵衣藻基因组的自动化蛋白基因组注释。
Proteomics. 2011 May;11(9):1814-23. doi: 10.1002/pmic.201000621. Epub 2011 Mar 22.
4
EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data.EasyCluster:一种用于大规模转录组数据的快速高效的面向基因的聚类工具。
BMC Bioinformatics. 2009 Jun 16;10 Suppl 6(Suppl 6):S10. doi: 10.1186/1471-2105-10-S6-S10.
5
Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments.使用 EVidenceModeler 和 Program to Assemble Spliced Alignments 进行自动化真核基因结构注释。
Genome Biol. 2008 Jan 11;9(1):R7. doi: 10.1186/gb-2008-9-1-r7.
6
CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction.对比法:一种用于多信息源从头基因预测的无系统发育的判别方法。
Genome Biol. 2007;8(12):R269. doi: 10.1186/gb-2007-8-12-r269.
7
EGASP: the human ENCODE Genome Annotation Assessment Project.EGASP:人类ENCODE基因组注释评估项目。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S2.1-31. doi: 10.1186/gb-2006-7-s1-s2. Epub 2006 Aug 7.
Genome Res. 2005 Dec;15(12):1777-86. doi: 10.1101/gr.3866105.
4
Gene and alternative splicing annotation with AIR.利用AIR进行基因和可变剪接注释。
Genome Res. 2005 Jan;15(1):54-66. doi: 10.1101/gr.2889405.
5
The Vertebrate Genome Annotation (Vega) database.脊椎动物基因组注释(Vega)数据库。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D459-65. doi: 10.1093/nar/gki135.
6
ESTGenes: alternative splicing from ESTs in Ensembl.EST基因:来自Ensembl中EST的可变剪接。
Genome Res. 2004 May;14(5):976-87. doi: 10.1101/gr.1862204.
7
Eval: a software package for analysis of genome annotations.Eval:一个用于分析基因组注释的软件包。
BMC Bioinformatics. 2003 Oct 17;4:50. doi: 10.1186/1471-2105-4-50.
8
Comparative gene prediction in human and mouse.人类与小鼠的基因预测比较
Genome Res. 2003 Jan;13(1):108-17. doi: 10.1101/gr.871403.
9
Comparative ab initio prediction of gene structures using pair HMMs.使用配对隐马尔可夫模型对基因结构进行比较从头预测。
Bioinformatics. 2002 Oct;18(10):1309-18. doi: 10.1093/bioinformatics/18.10.1309.
10
BLAT--the BLAST-like alignment tool.BLAT——类BLAST比对工具。
Genome Res. 2002 Apr;12(4):656-64. doi: 10.1101/gr.229202.