Vergara Ismael A, Norambuena Tomás, Ferrada Evandro, Slater Alex W, Melo Francisco
Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile.
BMC Bioinformatics. 2008 Jun 5;9:265. doi: 10.1186/1471-2105-9-265.
As in many different areas of science and technology, most important problems in bioinformatics rely on the proper development and assessment of binary classifiers. A generalized assessment of the performance of binary classifiers is typically carried out through the analysis of their receiver operating characteristic (ROC) curves. The area under the ROC curve (AUC) constitutes a popular indicator of the performance of a binary classifier. However, the assessment of the statistical significance of the difference between any two classifiers based on this measure is not a straightforward task, since not many freely available tools exist. Most existing software is either not free, difficult to use or not easy to automate when a comparative assessment of the performance of many binary classifiers is intended. This constitutes the typical scenario for the optimization of parameters when developing new classifiers and also for their performance validation through the comparison to previous art.
In this work we describe and release new software to assess the statistical significance of the observed difference between the AUCs of any two classifiers for a common task estimated from paired data or unpaired balanced data. The software is able to perform a pairwise comparison of many classifiers in a single run, without requiring any expert or advanced knowledge to use it. The software relies on a non-parametric test for the difference of the AUCs that accounts for the correlation of the ROC curves. The results are displayed graphically and can be easily customized by the user. A human-readable report is generated and the complete data resulting from the analysis are also available for download, which can be used for further analysis with other software. The software is released as a web server that can be used in any client platform and also as a standalone application for the Linux operating system.
A new software for the statistical comparison of ROC curves is released here as a web server and also as standalone software for the LINUX operating system.
与许多不同的科学技术领域一样,生物信息学中的大多数重要问题都依赖于二元分类器的正确开发和评估。二元分类器性能的一般评估通常通过分析其接收器操作特征(ROC)曲线来进行。ROC曲线下的面积(AUC)是二元分类器性能的一个常用指标。然而,基于此度量评估任意两个分类器之间差异的统计显著性并非易事,因为可用的免费工具不多。当要对许多二元分类器的性能进行比较评估时,大多数现有软件要么不免费,要么难以使用,要么不容易自动化。这是开发新分类器时参数优化以及通过与现有技术比较进行性能验证的典型情况。
在这项工作中,我们描述并发布了新软件,用于评估从配对数据或未配对平衡数据估计的常见任务中任意两个分类器的AUC之间观察到的差异的统计显著性。该软件能够在一次运行中对许多分类器进行成对比较,无需任何专业或高级知识即可使用。该软件依赖于一种用于AUC差异的非参数检验,该检验考虑了ROC曲线的相关性。结果以图形方式显示,用户可以轻松定制。生成一份可读的报告,分析产生的完整数据也可供下载,可用于与其他软件进行进一步分析。该软件作为一个网络服务器发布,可在任何客户端平台上使用,也作为Linux操作系统的独立应用程序发布。
这里发布了一种用于ROC曲线统计比较的新软件,作为网络服务器以及Linux操作系统的独立软件。