诊断分类器的多目标遗传优化及其对生成受试者工作特征曲线的意义

Multiobjective genetic optimization of diagnostic classifiers with implications for generating receiver operating characteristic curves.

作者信息

Kupinski M A, Anastasio M A

机构信息

Kurt Rossmann Laboratories, Department of Radiology, The University of Chicago, IL 60637, USA.

出版信息

IEEE Trans Med Imaging. 1999 Aug;18(8):675-85. doi: 10.1109/42.796281.

DOI:10.1109/42.796281

PMID:10534050

Abstract

It is well understood that binary classifiers have two implicit objective functions (sensitivity and specificity) describing their performance. Traditional methods of classifier training attempt to combine these two objective functions (or two analogous class performance measures) into one so that conventional scalar optimization techniques can be utilized. This involves incorporating a priori information into the aggregation method so that the resulting performance of the classifier is satisfactory for the task at hand. We have investigated the use of a niched Pareto multiobjective genetic algorithm (GA) for classifier optimization. With niched Pareto GA's, an objective vector is optimized instead of a scalar function, eliminating the need to aggregate classification objective functions. The niched Pareto GA returns a set of optimal solutions that are equivalent in the absence of any information regarding the preferences of the objectives. The a priori knowledge that was used for aggregating the objective functions in conventional classifier training can instead be applied post-optimization to select from one of the series of solutions returned from the multiobjective genetic optimization. We have applied this technique to train a linear classifier and an artificial neural network (ANN), using simulated datasets. The performances of the solutions returned from the multiobjective genetic optimization represent a series of optimal (sensitivity, specificity) pairs, which can be thought of as operating points on a receiver operating characteristic (ROC) curve. All possible ROC curves for a given dataset and classifier are less than or equal to the ROC curve generated by the niched Pareto genetic optimization.

摘要

众所周知，二分类器有两个描述其性能的隐式目标函数（灵敏度和特异性）。传统的分类器训练方法试图将这两个目标函数（或两个类似的分类性能指标）合并为一个，以便能够使用传统的标量优化技术。这涉及将先验信息纳入聚合方法，以使分类器的最终性能对于手头的任务而言是令人满意的。我们研究了使用小生境帕累托多目标遗传算法（GA）进行分类器优化。使用小生境帕累托GA时，优化的是目标向量而非标量函数，从而无需聚合分类目标函数。小生境帕累托GA会返回一组在没有任何关于目标偏好信息的情况下等效的最优解。在传统分类器训练中用于聚合目标函数的先验知识，反而可以在优化后应用，以便从多目标遗传优化返回的一系列解中进行选择。我们已将此技术应用于使用模拟数据集训练线性分类器和人工神经网络（ANN）。多目标遗传优化返回的解的性能代表了一系列最优的（灵敏度，特异性）对，这些对可以被视为接收者操作特征（ROC）曲线上的操作点。对于给定数据集和分类器的所有可能的ROC曲线都小于或等于由小生境帕累托遗传优化生成的ROC曲线。