迈向启动子预测评估的金标准。

Toward a gold standard for promoter prediction evaluation.

作者信息

Abeel Thomas, Van de Peer Yves, Saeys Yvan

机构信息

Department of Plant Systems Biology, VIB, Ghent University, Gent, Belgium.

出版信息

Bioinformatics. 2009 Jun 15;25(12):i313-20. doi: 10.1093/bioinformatics/btp191.

DOI:10.1093/bioinformatics/btp191

PMID:19478005

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2687945/

Abstract

MOTIVATION

Promoter prediction is an important task in genome annotation projects, and during the past years many new promoter prediction programs (PPPs) have emerged. However, many of these programs are compared inadequately to other programs. In most cases, only a small portion of the genome is used to evaluate the program, which is not a realistic setting for whole genome annotation projects. In addition, a common evaluation design to properly compare PPPs is still lacking.

RESULTS

We present a large-scale benchmarking study of 17 state-of-the-art PPPs. A multi-faceted evaluation strategy is proposed that can be used as a gold standard for promoter prediction evaluation, allowing authors of promoter prediction software to compare their method to existing methods in a proper way. This evaluation strategy is subsequently used to compare the chosen promoter predictors, and an in-depth analysis on predictive performance, promoter class specificity, overlap between predictors and positional bias of the predictions is conducted.

AVAILABILITY

We provide the implementations of the four protocols, as well as the datasets required to perform the benchmarks to the academic community free of charge on request.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

启动子预测是基因组注释项目中的一项重要任务，在过去几年中出现了许多新的启动子预测程序（PPP）。然而，其中许多程序与其他程序的比较并不充分。在大多数情况下，仅使用基因组的一小部分来评估程序，这对于全基因组注释项目来说并非现实的设置。此外，仍然缺乏一种用于正确比较PPP的通用评估设计。

结果

我们对17个最先进的PPP进行了大规模基准研究。提出了一种多方面的评估策略，可作为启动子预测评估的金标准，使启动子预测软件的作者能够以适当的方式将其方法与现有方法进行比较。随后使用该评估策略比较所选的启动子预测器，并对预测性能、启动子类别特异性、预测器之间的重叠以及预测的位置偏差进行深入分析。

可用性

我们将四个协议的实现以及执行基准测试所需的数据集免费提供给学术界，可应要求提供。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f2/2687945/4b08d8134a62/btp191f1.jpg

相似文献

Toward a gold standard for promoter prediction evaluation.

Bioinformatics. 2009 Jun 15;25(12):i313-20. doi: 10.1093/bioinformatics/btp191.

ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles.

Bioinformatics. 2008 Jul 1;24(13):i24-31. doi: 10.1093/bioinformatics/btn172.

Promoter prediction analysis on the whole human genome.

Nat Biotechnol. 2004 Nov;22(11):1467-73. doi: 10.1038/nbt1032.

Generic eukaryotic core promoter prediction using structural features of DNA.

Genome Res. 2008 Feb;18(2):310-23. doi: 10.1101/gr.6991408. Epub 2007 Dec 20.

Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.

Genome Biol. 2006;7 Suppl 1(Suppl 1):S3.1-13. doi: 10.1186/gb-2006-7-s1-s3. Epub 2006 Aug 7.

MetaProm: a neural network based meta-predictor for alternative human promoter prediction.

BMC Genomics. 2007 Oct 17;8:374. doi: 10.1186/1471-2164-8-374.

EnsemPro: an ensemble approach to predicting transcription start sites in human genomic DNA sequences.

Genomics. 2008 Mar;91(3):259-66. doi: 10.1016/j.ygeno.2007.11.001.

Protein multiple sequence alignment benchmarking through secondary structure prediction.

Bioinformatics. 2017 May 1;33(9):1331-1337. doi: 10.1093/bioinformatics/btw840.

Information theory applied to the sparse gene ontology annotation network to predict novel gene function.

Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.

引用本文的文献

DNA methylation analysis to differentiate reference, breed, and parent-of-origin effects in the bovine pangenome era.

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae061.

Predicting Promoters in Multiple Prokaryotes with Prompt.

Interdiscip Sci. 2024 Dec;16(4):814-828. doi: 10.1007/s12539-024-00637-8. Epub 2024 Aug 7.

Deep learning and support vector machines for transcription start site identification.

PeerJ Comput Sci. 2023 Apr 17;9:e1340. doi: 10.7717/peerj-cs.1340. eCollection 2023.

Explainable artificial intelligence as a reliable annotator of archaeal promoter regions.

Sci Rep. 2023 Jan 31;13(1):1763. doi: 10.1038/s41598-023-28571-7.

Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab551.

DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays.

Bioinformatics. 2021 Jul 12;37(Suppl_1):i280-i288. doi: 10.1093/bioinformatics/btab283.

TSSFinder-fast and accurate ab initio prediction of the core promoter in eukaryotic genomes.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab198.

Identification of putative promoters in 48 eukaryotic genomes on the basis of DNA free energy.

Sci Rep. 2018 Mar 14;8(1):4520. doi: 10.1038/s41598-018-22129-8.

TSSPlant: a new tool for prediction of plant Pol II promoters.

Nucleic Acids Res. 2017 May 5;45(8):e65. doi: 10.1093/nar/gkw1353.

The impact of sequence length and number of sequences on promoter prediction performance.

BMC Bioinformatics. 2015;16 Suppl 19(Suppl 19):S5. doi: 10.1186/1471-2105-16-S19-S5. Epub 2015 Dec 16.

本文引用的文献

ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles.

Bioinformatics. 2008 Jul 1;24(13):i24-31. doi: 10.1093/bioinformatics/btn172.

Generic eukaryotic core promoter prediction using structural features of DNA.

Genome Res. 2008 Feb;18(2):310-23. doi: 10.1101/gr.6991408. Epub 2007 Dec 20.

A code for transcription initiation in mammalian genomes.

Genome Res. 2008 Jan;18(1):1-12. doi: 10.1101/gr.6831208. Epub 2007 Nov 21.

DBTSS: database of transcription start sites, progress report 2008.

Nucleic Acids Res. 2008 Jan;36(Database issue):D97-101. doi: 10.1093/nar/gkm901. Epub 2007 Oct 16.

Eukaryotic promoter prediction based on relative entropy and positional information.

Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Apr;75(4 Pt 1):041908. doi: 10.1103/PhysRevE.75.041908. Epub 2007 Apr 12.

Mammalian RNA polymerase II core promoters: insights from genome-wide studies.

Nat Rev Genet. 2007 Jun;8(6):424-36. doi: 10.1038/nrg2026. Epub 2007 May 8.

CpGcluster: a distance-based algorithm for CpG-island detection.

BMC Bioinformatics. 2006 Oct 12;7:446. doi: 10.1186/1471-2105-7-446.

PromoterExplorer: an effective promoter identification method based on the AdaBoost algorithm.

Bioinformatics. 2006 Nov 15;22(22):2722-8. doi: 10.1093/bioinformatics/btl482. Epub 2006 Sep 25.

Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.

Genome Biol. 2006;7 Suppl 1(Suppl 1):S3.1-13. doi: 10.1186/gb-2006-7-s1-s3. Epub 2006 Aug 7.

ARTS: accurate recognition of transcription starts in human.

Bioinformatics. 2006 Jul 15;22(14):e472-80. doi: 10.1093/bioinformatics/btl250.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

迈向启动子预测评估的金标准。

Toward a gold standard for promoter prediction evaluation.

作者信息

Abeel Thomas, Van de Peer Yves, Saeys Yvan

机构信息

Department of Plant Systems Biology, VIB, Ghent University, Gent, Belgium.