Cho Ji-Hoon, Lee Dongkwon, Park Jin Hyun, Lee In-Beum
Department of Chemical Engineering, Pohang University of Science and Technology, San 31 Hyoja-Dong, Pohang 790-784, Republic of Korea.
FEBS Lett. 2004 Jul 30;571(1-3):93-8. doi: 10.1016/j.febslet.2004.05.087.
The discrimination of cancer patients (including subtypes) based on gene expression data is a critical problem with clinical ramifications. Central to solving this problem is the issue of how to extract the most relevant genes from the several thousand genes on a typical microarray. Here, we propose a methodology that can effectively select an informative subset of genes and classify the subtypes (or patients) of disease using the selected genes. We employ a kernel machine, kernel Fisher discriminant analysis (KFDA), for discrimination and use the derivatives of the kernel function to perform gene selection. Using a modified form of KFDA in the minimum squared error (MSE) sense and the gradients of the kernel functions, we construct an effective gene selection criterion. We assess the performance of the proposed methodology by applying it to three gene expression datasets: leukemia dataset, breast cancer dataset and colon cancer dataset. Using a few informative genes, the proposed method accurately and reliably classified cancer subtypes (or patients). Also, through a comparison study, we verify the reliability of the gene selection and discrimination results.
基于基因表达数据对癌症患者(包括亚型)进行区分是一个具有临床影响的关键问题。解决此问题的核心在于如何从典型微阵列上的数千个基因中提取最相关的基因。在此,我们提出一种方法,该方法可以有效地选择信息丰富的基因子集,并使用所选基因对疾病的亚型(或患者)进行分类。我们采用核机器,即核 Fisher 判别分析(KFDA)进行区分,并使用核函数的导数来进行基因选择。通过在最小平方误差(MSE)意义下使用 KFDA 的改进形式以及核函数的梯度,我们构建了一个有效的基因选择标准。我们将所提出的方法应用于三个基因表达数据集:白血病数据集、乳腺癌数据集和结肠癌数据集,以评估其性能。使用少数信息丰富的基因,所提出的方法准确且可靠地对癌症亚型(或患者)进行了分类。此外,通过比较研究,我们验证了基因选择和区分结果的可靠性。