Forbes A D
Medical Department, Hewlett-Packard Laboratories, Palo Alto, CA 94303-0867, USA.
J Clin Monit. 1995 May;11(3):189-206. doi: 10.1007/BF01617722.
The objective of this paper is to introduce, explain, and extend methods for comparing the performance of classification algorithms using error tallies obtained on properly sized, populated, and labeled data sets.
Two distinct contexts of classification are defined, involving "objects-by-inspection" and "objects-by-segmentation." In the former context, the total number of objects to be classified is unambiguously and self-evidently defined. In the latter, there is troublesome ambiguity. All five of the measures of performance here considered are based on confusion matrices, tables of counts revealing the extent of an algorithm's "confusion" regarding the true classifications. A proper measure of classification-algorithm performance must meet four requirements. A proper measure should obey six additional constraints.
Four traditional measures of performance are critiqued in terms of the requirements and constraints. Each measure meets the requirements, but fails to obey at least one of the constraints. A nontraditional measure of algorithm performance, the normalized mutual information (NMI), is therefore introduced. Based on the NMI, methods for comparing algorithm performance using confusion matrices are devised.
The five performance measures lead to similar inferences when comparing a trio of QRS-detection algorithms using a large data set. The modified NMI is preferred, however, because it obeys each of the constraints and is the most conservative measure of performance.
本文的目的是介绍、解释并扩展一些方法,这些方法用于使用在大小合适、数据充实且带有标签的数据集上获得的错误计数来比较分类算法的性能。
定义了两种不同的分类情境,分别涉及“逐个检查对象”和“逐个分割对象”。在前一种情境中,要分类的对象总数是明确且不言而喻地定义的。而在后一种情境中,存在麻烦的模糊性。这里所考虑的所有五种性能度量都是基于混淆矩阵的,混淆矩阵是一种计数表,揭示了算法在真实分类方面的“混淆”程度。一种合适的分类算法性能度量必须满足四个要求。一种合适的度量还应遵循另外六个约束条件。
从这些要求和约束条件的角度对四种传统的性能度量进行了批判。每种度量都满足了要求,但至少未能遵循其中一个约束条件。因此,引入了一种非传统的算法性能度量,即归一化互信息(NMI)。基于NMI,设计了使用混淆矩阵来比较算法性能的方法。
当使用一个大数据集比较三种QRS检测算法时,这五种性能度量会得出相似的推断。然而,改进后的NMI更受青睐,因为它遵循每个约束条件,并且是最保守的性能度量。