School of Electrical and Computer Engineering, National Technical University of Athens, Greece.
Comput Biol Med. 2013 Dec;43(12):2118-26. doi: 10.1016/j.compbiomed.2013.09.016. Epub 2013 Sep 28.
Primary and Secondary Polycythemia are diseases of the bone marrow that affect the blood's composition and prohibit patients from becoming blood donors. Since these diseases may become fatal, their early diagnosis is important. In this paper, a classification system for the diagnosis of Primary and Secondary Polycythemia is proposed. The proposed system classifies input data into three classes; Healthy, Primary Polycythemic (PP) and Secondary Polycythemic (SP) and is implemented using two separate binary classification levels. The first level performs the Healthy/non-Healthy classification and the second level the PP/SP classification. To this end, a novel wrapper feature selection algorithm, called the LM-FM algorithm, is presented in order to maximize the classifier's performance. The algorithm is comprised of two stages that are applied sequentially: the Local Maximization (LM) stage and the Floating Maximization (FM) stage. The LM stage finds the best possible subset of a fixed predefined size, which is then used as an input for the next stage. The FM stage uses a floating size technique to search for an even better solution by varying the initially provided subset size. Then, the Support Vector Machine (SVM) classifier is used for the discrimination of the data at each classification level. The proposed classification system is compared with various well-established feature selection techniques such as the Sequential Floating Forward Selection (SFFS) and the Maximum Output Information (MOI) wrapper schemes, and with standalone classification techniques such as the Multilayer Perceptron (MLP) and SVM classifier. The proposed LM-FM feature selection algorithm combined with the SVM classifier increases the overall performance of the classification system, scoring up to 98.9% overall accuracy at the first classification level and up to 96.6% at the second classification level. Moreover, it provides excellent robustness regardless of the size of the input feature subset used.
原发性和继发性红细胞增多症是影响血液成分的骨髓疾病,使患者无法成为献血者。由于这些疾病可能致命,因此早期诊断很重要。本文提出了一种原发性和继发性红细胞增多症的诊断分类系统。该系统将输入数据分为三类:健康、原发性红细胞增多症(PP)和继发性红细胞增多症(SP),并使用两个独立的二进制分类级别来实现。第一级执行健康/非健康分类,第二级执行 PP/SP 分类。为此,提出了一种名为 LM-FM 的新型包装特征选择算法,以最大限度地提高分类器的性能。该算法由两个阶段组成,依次应用:局部最大化(LM)阶段和浮动最大化(FM)阶段。LM 阶段找到固定预定义大小的最佳可能子集,然后将其用作下一个阶段的输入。FM 阶段使用浮动大小技术通过改变初始提供的子集大小来搜索更好的解决方案。然后,支持向量机(SVM)分类器用于在每个分类级别对数据进行区分。所提出的分类系统与各种成熟的特征选择技术(如顺序浮动前向选择(SFFS)和最大输出信息(MOI)包装方案)以及独立的分类技术(如多层感知机(MLP)和 SVM 分类器)进行了比较。所提出的 LM-FM 特征选择算法与 SVM 分类器相结合,提高了分类系统的整体性能,在第一级分类中达到了高达 98.9%的整体准确性,在第二级分类中达到了高达 96.6%的准确性。此外,无论使用的输入特征子集的大小如何,它都提供了出色的鲁棒性。