Chang Siow-Wee, Kareem Sameem Abdul, Kallarakkal Thomas George, Merican Amir Feisal Merican Aljunid, Abraham Mannil Thomas, Zain Rosnah Binti
Department of Artificial Intelligence, Faculty of Computer Science and Information Technology and Bioinformatics Division, Institute of Biological Science, University of Malaya, Kuala Lumpur, Malaysia.
Asian Pac J Cancer Prev. 2011;12(10):2659-64.
The incidence of oral cancer is high for those of Indian ethnic origin in Malaysia. Various clinical and pathological data are usually used in oral cancer prognosis. However, due to time, cost and tissue limitations, the number of prognosis variables need to be reduced. In this research, we demonstrated the use of feature selection methods to select a subset of variables that is highly predictive of oral cancer prognosis. The objective is to reduce the number of input variables, thus to identify the key clinicopathologic (input) variables of oral cancer prognosis based on the data collected in the Malaysian scenario. Two feature selection methods, genetic algorithm (wrapper approach) and Pearson's correlation coefficient (filter approach) were implemented and compared with single-input models and a full-input model. The results showed that the reduced models with feature selection method are able to produce more accurate prognosis results than the full-input model and single-input model, with the Pearson's correlation coefficient achieving the most promising results.
在马来西亚,印度裔人群的口腔癌发病率较高。口腔癌预后通常会用到各种临床和病理数据。然而,由于时间、成本和组织限制,需要减少预后变量的数量。在本研究中,我们展示了使用特征选择方法来选择一组对口腔癌预后具有高度预测性的变量子集。目的是减少输入变量的数量,从而根据在马来西亚的情况收集的数据确定口腔癌预后的关键临床病理(输入)变量。实施了两种特征选择方法,即遗传算法(包装法)和皮尔逊相关系数(过滤法),并与单输入模型和全输入模型进行比较。结果表明,采用特征选择方法的简化模型比全输入模型和单输入模型能够产生更准确的预后结果,其中皮尔逊相关系数取得了最有前景的结果。