Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, 701 Ellicott St, Buffalo, NY, 14203, USA.
Department of Biochemistry, University at Buffalo-State University of New York, 701 Ellicott St, Buffalo, NY, 14203, USA.
BMC Bioinformatics. 2019 Apr 5;20(1):174. doi: 10.1186/s12859-019-2781-x.
Identifying transcriptional enhancers and other cis-regulatory modules (CRMs) is an important goal of post-sequencing genome annotation. Computational approaches provide a useful complement to empirical methods for CRM discovery, but it is critical that we develop effective means to evaluate their performance in terms of estimating their sensitivity and specificity.
We introduce here pCRMeval, a pipeline for in silico evaluation of any enhancer prediction tools that are flexible enough to be applied to the Drosophila melanogaster genome. pCRMeval compares the result of predictions with the extensive existing knowledge of experimentally-validated Drosophila CRMs in order to estimate the precision and relative sensitivity of the prediction method. In the case of supervised prediction methods-when training data composed of validated CRMs are used-pCRMeval can also assess the sensitivity of specific training sets. We demonstrate the utility of pCRMeval through evaluation of our SCRMshaw CRM prediction method and training data. By measuring the impact of different parameters on SCRMshaw performance, as assessed by pCRMeval, we develop a more robust version of SCRMshaw, SCRMshaw_HD, that improves the number of predictions while maintaining sensitivity and specificity. Our analysis also demonstrates that SCRMshaw_HD, when applied to increasingly less well-assembled genomes, maintains its strong predictive power with only a minor drop-off in performance.
Our pCRMeval pipeline provides a general framework for evaluation that can be applied to any CRM prediction method, particularly a supervised method. While we make use of it here primarily to test and improve a particular method for CRM prediction, SCRMshaw, pCRMeval should provide a valuable platform to the research community not only for evaluating individual methods, but also for comparing between competing methods.
鉴定转录增强子和其他顺式调控模块(CRMs)是测序后基因组注释的一个重要目标。计算方法为 CRM 发现提供了一种有用的补充方法,但关键是我们要开发出有效的方法来评估它们在估计敏感性和特异性方面的性能。
我们在这里介绍 pCRMeval,这是一种用于评估任何增强子预测工具的虚拟评估管道,这些工具具有足够的灵活性,可以应用于黑腹果蝇基因组。pCRMeval 将预测的结果与广泛存在的实验验证的果蝇 CRM 现有知识进行比较,以估计预测方法的精度和相对敏感性。在有监督的预测方法中-当使用由验证 CRM 组成的训练数据时-pCRMeval 还可以评估特定训练集的敏感性。我们通过评估我们的 SCRMshaw CRM 预测方法和训练数据来展示 pCRMeval 的实用性。通过测量不同参数对 pCRMeval 评估的 SCRMshaw 性能的影响,我们开发了一个更稳健的 SCRMshaw 版本,即 SCRMshaw_HD,它提高了预测数量,同时保持了敏感性和特异性。我们的分析还表明,当应用于组装质量越来越差的基因组时,SCRMshaw_HD 保持其强大的预测能力,仅略有性能下降。
我们的 pCRMeval 管道提供了一种通用的评估框架,可应用于任何 CRM 预测方法,特别是有监督的方法。虽然我们在这里主要利用它来测试和改进 CRM 预测的特定方法,但 pCRMeval 应该为研究社区提供一个有价值的平台,不仅可以评估单个方法,还可以比较竞争方法。