Abeel Thomas, Van de Peer Yves, Saeys Yvan
Department of Plant Systems Biology, VIB, Ghent University, Gent, Belgium.
Bioinformatics. 2009 Jun 15;25(12):i313-20. doi: 10.1093/bioinformatics/btp191.
Promoter prediction is an important task in genome annotation projects, and during the past years many new promoter prediction programs (PPPs) have emerged. However, many of these programs are compared inadequately to other programs. In most cases, only a small portion of the genome is used to evaluate the program, which is not a realistic setting for whole genome annotation projects. In addition, a common evaluation design to properly compare PPPs is still lacking.
We present a large-scale benchmarking study of 17 state-of-the-art PPPs. A multi-faceted evaluation strategy is proposed that can be used as a gold standard for promoter prediction evaluation, allowing authors of promoter prediction software to compare their method to existing methods in a proper way. This evaluation strategy is subsequently used to compare the chosen promoter predictors, and an in-depth analysis on predictive performance, promoter class specificity, overlap between predictors and positional bias of the predictions is conducted.
We provide the implementations of the four protocols, as well as the datasets required to perform the benchmarks to the academic community free of charge on request.
Supplementary data are available at Bioinformatics online.
启动子预测是基因组注释项目中的一项重要任务,在过去几年中出现了许多新的启动子预测程序(PPP)。然而,其中许多程序与其他程序的比较并不充分。在大多数情况下,仅使用基因组的一小部分来评估程序,这对于全基因组注释项目来说并非现实的设置。此外,仍然缺乏一种用于正确比较PPP的通用评估设计。
我们对17个最先进的PPP进行了大规模基准研究。提出了一种多方面的评估策略,可作为启动子预测评估的金标准,使启动子预测软件的作者能够以适当的方式将其方法与现有方法进行比较。随后使用该评估策略比较所选的启动子预测器,并对预测性能、启动子类别特异性、预测器之间的重叠以及预测的位置偏差进行深入分析。
我们将四个协议的实现以及执行基准测试所需的数据集免费提供给学术界,可应要求提供。
补充数据可在《生物信息学》在线获取。