Faculty of Computers and Information, Benha University, Egypt.
Faculty of Computers and Information, Cairo University, Egypt; Scientific Research Group in Egypt (SRGE), Egypt.
Comput Methods Programs Biomed. 2014 Feb;113(2):465-73. doi: 10.1016/j.cmpb.2013.11.004. Epub 2013 Nov 14.
Machine learning-based classification techniques provide support for the decision-making process in many areas of health care, including diagnosis, prognosis, screening, etc. Feature selection (FS) is expected to improve classification performance, particularly in situations characterized by the high data dimensionality problem caused by relatively few training examples compared to a large number of measured features. In this paper, a random forest classifier (RFC) approach is proposed to diagnose lymph diseases. Focusing on feature selection, the first stage of the proposed system aims at constructing diverse feature selection algorithms such as genetic algorithm (GA), Principal Component Analysis (PCA), Relief-F, Fisher, Sequential Forward Floating Search (SFFS) and the Sequential Backward Floating Search (SBFS) for reducing the dimension of lymph diseases dataset. Switching from feature selection to model construction, in the second stage, the obtained feature subsets are fed into the RFC for efficient classification. It was observed that GA-RFC achieved the highest classification accuracy of 92.2%. The dimension of input feature space is reduced from eighteen to six features by using GA.
基于机器学习的分类技术为医疗保健的许多领域(包括诊断、预后、筛查等)的决策过程提供支持。特征选择(FS)有望提高分类性能,特别是在数据维度高的情况下,由于训练样本相对较少,而测量特征数量较大,导致数据维度高的问题。本文提出了一种随机森林分类器(RFC)方法来诊断淋巴疾病。该系统的第一阶段侧重于特征选择,旨在构建多种特征选择算法,如遗传算法(GA)、主成分分析(PCA)、 Relief-F、Fisher、顺序前向浮动搜索(SFFS)和顺序后向浮动搜索(SBFS),以降低淋巴疾病数据集的维度。从特征选择到模型构建的转换,在第二阶段,将获得的特征子集输入到 RFC 中进行有效分类。结果表明,GA-RFC 实现了最高的分类精度 92.2%。通过使用 GA,输入特征空间的维度从十八个减少到六个。