University of Huddersfield, Queensgate, Huddersfield, United Kingdom .
Comput Methods Programs Biomed. 2017 Jul;146:11-24. doi: 10.1016/j.cmpb.2017.05.001. Epub 2017 May 4.
This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process.
In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these.
It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%).
It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society.
本文研究了高性能遗传数据特征选择和分类算法在结肠癌诊断中的准确性和效率(时间复杂度)。之所以需要进行这项研究,是因为迫切需要准确和高效的算法。结肠癌是全球主要的死亡原因之一,因此,快速、准确地识别和分类癌症组织至关重要,这不仅能快速发现疾病,还能加快药物研发进程。
本研究提出并实施了三阶段方法:第一阶段和第二阶段分别检查了所采用的特征选择算法和分类算法,第三阶段检查了这些算法的组合性能。
第一阶段发现粒子群优化(PSO)算法在结肠癌数据集上的特征选择(选择了 29 个基因)中表现最佳,第二阶段发现支持向量机(SVM)算法在分类方面优于其他算法,准确率接近 86%。第三阶段还发现,PSO 和 SVM 的联合使用在准确性和性能方面优于其他算法,在时间分析方面更快(94%)。
与仅应用分类算法相比,在分类算法之前应用特征选择算法可提高准确性。这一结论对行业和社会具有重要意义。