Shen Qi, Mei Zhen, Ye Bao-Xian
Department of Chemistry, Zhengzhou University, Zhengzhou, China.
Comput Biol Med. 2009 Jul;39(7):646-9. doi: 10.1016/j.compbiomed.2009.04.008. Epub 2009 May 28.
Gene expression datasets is a means to classify and predict the diagnostic categories of a patient. Informative genes and representative samples selection are two important aspects for reducing gene expression data. Identifying and pruning redundant genes and samples simultaneously can improve the performance of classification and circumvent the local optima problem. In the present paper, the modified particle swarm optimization was applied to selecting optimal genes and samples simultaneously and support vector machine was used as an objective function to determine the optimum set of genes and samples. To evaluate the performance of the new proposed method, it was applied to three publicly available microarray datasets. It has been demonstrated that the proposed method for gene and sample selection is a useful tool for mining high dimension data.
基因表达数据集是对患者诊断类别进行分类和预测的一种手段。信息基因和代表性样本的选择是减少基因表达数据的两个重要方面。同时识别和去除冗余基因和样本可以提高分类性能并规避局部最优问题。在本文中,改进的粒子群优化算法被应用于同时选择最优基因和样本,支持向量机被用作目标函数来确定基因和样本的最优集。为了评估新提出方法的性能,将其应用于三个公开可用的微阵列数据集。结果表明,所提出的基因和样本选择方法是挖掘高维数据的有用工具。