• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EGASP:人类ENCODE基因组注释评估项目。

EGASP: the human ENCODE Genome Annotation Assessment Project.

作者信息

Guigó Roderic, Flicek Paul, Abril Josep F, Reymond Alexandre, Lagarde Julien, Denoeud France, Antonarakis Stylianos, Ashburner Michael, Bajic Vladimir B, Birney Ewan, Castelo Robert, Eyras Eduardo, Ucla Catherine, Gingeras Thomas R, Harrow Jennifer, Hubbard Tim, Lewis Suzanna E, Reese Martin G

机构信息

Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain.

出版信息

Genome Biol. 2006;7 Suppl 1(Suppl 1):S2.1-31. doi: 10.1186/gb-2006-7-s1-s2. Epub 2006 Aug 7.

DOI:10.1186/gb-2006-7-s1-s2
PMID:16925836
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1810551/
Abstract

BACKGROUND

We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment.

RESULTS

The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified.

CONCLUSION

This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.

摘要

背景

我们展示了ENCODE基因组注释评估项目(EGASP)的结果,这是一项旨在评估人类基因组序列1%的ENCODE区域内基因组注释技术现状的社区实验。该实验有两个主要目标:评估预测蛋白质编码基因的计算方法的准确性;以及全面评估ENCODE区域所代表的当前人类基因组注释的完整性。对于计算预测评估,有18个团队提交了基因预测结果。我们根据作为GENCODE项目一部分生成的注释“参考集”对这些提交结果进行了相互评估。这些注释在提交截止日期之前对预测团队不可用,因此他们的预测是盲测,并且一个外部咨询委员会可以进行公平评估。

结果

最佳方法对近70%的注释基因至少正确预测了一个基因转录本。然而,考虑到可变剪接,多个转录本的准确率仅达到约40%至50%。在编码核苷酸水平上,最佳程序在敏感性和特异性方面均达到了90%的准确率。依赖mRNA和蛋白质序列的程序在重现人工编辑注释方面最为准确。实验验证表明,在现有注释之外选择的221个计算预测外显子中,只有非常小的比例(3.2%)能够得到验证。

结论

这是人类DNA领域的首次此类实验,我们遵循了在果蝇中进行的类似实验GASP1所确立的标准。我们相信这里展示的结果有助于正在进行的大规模注释项目的价值提升,并且在扩大到整个人类基因组序列时应指导进一步的实验方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/1d0962c9856c/gb-2006-7-s1-s2-14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/7a8a30554a44/gb-2006-7-s1-s2-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/aa70e8a7ad47/gb-2006-7-s1-s2-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/dfb2744f2cdb/gb-2006-7-s1-s2-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/5e0cbe6dce37/gb-2006-7-s1-s2-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/372b9d34de36/gb-2006-7-s1-s2-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/73495720b383/gb-2006-7-s1-s2-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/7990ffa5f9c2/gb-2006-7-s1-s2-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/0981ec1d6741/gb-2006-7-s1-s2-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/42e7a609cb24/gb-2006-7-s1-s2-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/f2ed9caf63ad/gb-2006-7-s1-s2-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/eeb5fc1f53d8/gb-2006-7-s1-s2-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/dd78b47cd2fc/gb-2006-7-s1-s2-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/6061a16b66fd/gb-2006-7-s1-s2-13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/1d0962c9856c/gb-2006-7-s1-s2-14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/7a8a30554a44/gb-2006-7-s1-s2-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/aa70e8a7ad47/gb-2006-7-s1-s2-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/dfb2744f2cdb/gb-2006-7-s1-s2-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/5e0cbe6dce37/gb-2006-7-s1-s2-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/372b9d34de36/gb-2006-7-s1-s2-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/73495720b383/gb-2006-7-s1-s2-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/7990ffa5f9c2/gb-2006-7-s1-s2-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/0981ec1d6741/gb-2006-7-s1-s2-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/42e7a609cb24/gb-2006-7-s1-s2-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/f2ed9caf63ad/gb-2006-7-s1-s2-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/eeb5fc1f53d8/gb-2006-7-s1-s2-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/dd78b47cd2fc/gb-2006-7-s1-s2-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/6061a16b66fd/gb-2006-7-s1-s2-13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29a6/1810551/1d0962c9856c/gb-2006-7-s1-s2-14.jpg

相似文献

1
EGASP: the human ENCODE Genome Annotation Assessment Project.EGASP:人类ENCODE基因组注释评估项目。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S2.1-31. doi: 10.1186/gb-2006-7-s1-s2. Epub 2006 Aug 7.
2
GENCODE: producing a reference annotation for ENCODE.GENCODE:为ENCODE生成参考注释。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S4.1-9. doi: 10.1186/gb-2006-7-s1-s4. Epub 2006 Aug 7.
3
Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.在EGASP实验中对ENCODE区域的启动子预测进行性能评估。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S3.1-13. doi: 10.1186/gb-2006-7-s1-s3. Epub 2006 Aug 7.
4
AceView: a comprehensive cDNA-supported gene and transcripts annotation.AceView:一个由cDNA支持的全面的基因和转录本注释。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S12.1-14. doi: 10.1186/gb-2006-7-s1-s12. Epub 2006 Aug 7.
5
Using several pair-wise informant sequences for de novo prediction of alternatively spliced transcripts.使用多个成对的信息序列进行可变剪接转录本的从头预测。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S8.1-9. doi: 10.1186/gb-2006-7-s1-s8. Epub 2006 Aug 7.
6
GENCODE: the reference human genome annotation for The ENCODE Project.GENCODE:ENCODE 项目的人类参考基因组注释。
Genome Res. 2012 Sep;22(9):1760-74. doi: 10.1101/gr.135350.111.
7
Genome annotation assessment in Drosophila melanogaster.黑腹果蝇的基因组注释评估
Genome Res. 2000 Apr;10(4):483-501. doi: 10.1101/gr.10.4.483.
8
Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA.Exogean:一种用于注释真核生物基因组DNA中蛋白质编码基因的框架。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S7.1-10. doi: 10.1186/gb-2006-7-s1-s7. Epub 2006 Aug 7.
9
AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome.EGASP中的AUGUSTUS:利用EST、蛋白质和基因组比对改进人类基因组中的基因预测
Genome Biol. 2006;7 Suppl 1(Suppl 1):S11.1-8. doi: 10.1186/gb-2006-7-s1-s11. Epub 2006 Aug 7.
10

引用本文的文献

1
GENCODE 2025: reference gene annotation for human and mouse.GENCODE 2025:人类和小鼠的参考基因注释
Nucleic Acids Res. 2025 Jan 6;53(D1):D966-D975. doi: 10.1093/nar/gkae1078.
2
GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.GeneMark-ETP 显著提高了大型真核基因组自动注释的准确性。
Genome Res. 2024 Jun 25;34(5):757-768. doi: 10.1101/gr.278373.123.
3
Systematic assessment of long-read RNA-seq methods for transcript identification and quantification.系统评估长读 RNA-seq 方法在转录本鉴定和定量中的应用。

本文引用的文献

1
JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions.JIGSAW、GeneZilla和GlimmerHMM:解析ENCODE区域中人类基因的特征
Genome Biol. 2006;7 Suppl 1(Suppl 1):S9.1-13. doi: 10.1186/gb-2006-7-s1-s9. Epub 2006 Aug 7.
2
Using several pair-wise informant sequences for de novo prediction of alternatively spliced transcripts.使用多个成对的信息序列进行可变剪接转录本的从头预测。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S8.1-9. doi: 10.1186/gb-2006-7-s1-s8. Epub 2006 Aug 7.
3
Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA.
Nat Methods. 2024 Jul;21(7):1349-1363. doi: 10.1038/s41592-024-02298-3. Epub 2024 Jun 7.
4
Effects of In Utero EtOH Exposure on 18S Ribosomal RNA Processing: Contribution to Fetal Alcohol Spectrum Disorder.宫内乙醇暴露对 18S 核糖体 RNA 加工的影响:对胎儿酒精谱系障碍的贡献。
Int J Mol Sci. 2023 Sep 5;24(18):13714. doi: 10.3390/ijms241813714.
5
gene prediction for protein-coding regions.蛋白质编码区域的基因预测。
Bioinform Adv. 2023 Aug 10;3(1):vbad105. doi: 10.1093/bioadv/vbad105. eCollection 2023.
6
Systematic assessment of long-read RNA-seq methods for transcript identification and quantification.用于转录本鉴定和定量的长读长RNA测序方法的系统评估。
bioRxiv. 2023 Jul 27:2023.07.25.550582. doi: 10.1101/2023.07.25.550582.
7
A new gene finding tool GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.一种新的基因发现工具GeneMark-ETP显著提高了大型真核生物基因组自动注释的准确性。
bioRxiv. 2024 Apr 17:2023.01.13.524024. doi: 10.1101/2023.01.13.524024.
8
Best genome sequencing strategies for annotation of complex immune gene families in wildlife.野生动物复杂免疫基因家族注释的最佳基因组测序策略。
Gigascience. 2022 Oct 30;11. doi: 10.1093/gigascience/giac100.
9
Bookend: precise transcript reconstruction with end-guided assembly.Bookend:端引导组装的精确转录本重构。
Genome Biol. 2022 Jun 29;23(1):143. doi: 10.1186/s13059-022-02700-3.
10
Long Non-Coding RNA-Based Functional Prediction Reveals Novel Targets in Notch-Upregulated Ovarian Cancer.基于长链非编码RNA的功能预测揭示了Notch上调型卵巢癌中的新靶点。
Cancers (Basel). 2022 Mar 18;14(6):1557. doi: 10.3390/cancers14061557.
Exogean:一种用于注释真核生物基因组DNA中蛋白质编码基因的框架。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S7.1-10. doi: 10.1186/gb-2006-7-s1-s7. Epub 2006 Aug 7.
4
Vertebrate gene finding from multiple-species alignments using a two-level strategy.使用两级策略从多物种比对中寻找脊椎动物基因。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S6.1-12. doi: 10.1186/gb-2006-7-s1-s6. Epub 2006 Aug 7.
5
Pairagon+N-SCAN_EST: a model-based gene annotation pipeline.Pairagon+N-SCAN_EST:一种基于模型的基因注释流程。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S5.1-10. doi: 10.1186/gb-2006-7-s1-s5. Epub 2006 Aug 7.
6
GENCODE: producing a reference annotation for ENCODE.GENCODE:为ENCODE生成参考注释。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S4.1-9. doi: 10.1186/gb-2006-7-s1-s4. Epub 2006 Aug 7.
7
Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.在EGASP实验中对ENCODE区域的启动子预测进行性能评估。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S3.1-13. doi: 10.1186/gb-2006-7-s1-s3. Epub 2006 Aug 7.
8
A computational approach for identifying pseudogenes in the ENCODE regions.一种用于识别ENCODE区域中假基因的计算方法。
Genome Biol. 2006;7 Suppl 1(Suppl 1):S13.1-10. doi: 10.1186/gb-2006-7-s1-s13. Epub 2006 Aug 7.
9
AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome.EGASP中的AUGUSTUS:利用EST、蛋白质和基因组比对改进人类基因组中的基因预测
Genome Biol. 2006;7 Suppl 1(Suppl 1):S11.1-8. doi: 10.1186/gb-2006-7-s1-s11. Epub 2006 Aug 7.
10
Automatic annotation of eukaryotic genes, pseudogenes and promoters.真核基因、假基因和启动子的自动注释
Genome Biol. 2006;7 Suppl 1(Suppl 1):S10.1-12. doi: 10.1186/gb-2006-7-s1-s10. Epub 2006 Aug 7.