Vigneron Vincent, Maaref Hichem
Informatique, Bio-informatique et Systèmes Complexes (IBISC) EA 4526, univ Evry, Université Paris-Saclay, 40 rue du Pelvoux, 91020 Evry, France.
Entropy (Basel). 2019 Apr 26;21(5):440. doi: 10.3390/e21050440.
The goal of classifier combination can be briefly stated as combining the decisions of individual classifiers to obtain a better classifier. In this paper, we propose a method based on the combination of weak rank classifiers because rankings contain more information than unique choices for a many-class problem. The problem of combining the decisions of more than one classifier with raw outputs in the form of candidate class rankings is considered and formulated as a general discrete optimization problem with an objective function based on the distance between the data and the consensus decision. This formulation uses certain performance statistics about the joint behavior of the ensemble of classifiers. Assuming that each classifier produces a ranking list of classes, an initial approach leads to a binary linear programming problem with a simple and global optimum solution. The consensus function can be considered as a mapping from a set of individual rankings to a combined ranking, leading to the most relevant decision. We also propose an information measure that quantifies the degree of consensus between the classifiers to assess the strength of the combination rule that is used. It is easy to implement and does not require any training. The main conclusion is that the classification rate is strongly improved by combining rank classifiers globally. The proposed algorithm is tested on real cytology image data to detect cervical cancer.
将各个分类器的决策进行组合,以获得一个更好的分类器。在本文中,我们提出了一种基于弱排序分类器组合的方法,因为对于多类问题,排序比单一选择包含更多信息。我们考虑了将多个分类器以候选类排名形式的原始输出进行决策组合的问题,并将其表述为一个一般离散优化问题,其目标函数基于数据与一致决策之间的距离。这种表述使用了关于分类器集合联合行为的某些性能统计量。假设每个分类器生成一个类别的排名列表,一种初始方法会导致一个具有简单全局最优解的二元线性规划问题。一致函数可以被视为从一组个体排名到组合排名的映射,从而得出最相关的决策。我们还提出了一种信息度量,用于量化分类器之间的一致程度,以评估所使用的组合规则的强度。它易于实现且无需任何训练。主要结论是,通过全局组合排序分类器,分类率得到了显著提高。所提出的算法在真实的细胞学图像数据上进行了测试,以检测宫颈癌。