Cho Ji-Hoon, Lee Dongkwon, Park Jin Hyun, Lee In-Beum
Department of Chemical Engineering, Pohang University of Science and Technology, San 31 Hyoja-Dong, 790-784 Pohang, South Korea.
FEBS Lett. 2003 Sep 11;551(1-3):3-7. doi: 10.1016/s0014-5793(03)00819-6.
In this work we propose a new method for finding gene subsets of microarray data that effectively discriminates subtypes of disease. We developed a new criterion for measuring the relevance of individual genes by using mean and standard deviation of distances from each sample to the class centroid in order to treat the well-known problem of gene selection, large within-class variation. Also this approach has the advantage that it is applicable not only to binary classification but also to multiple classification problems. We demonstrated the performance of the method by applying it to the publicly available microarray datasets, leukemia (two classes) and small round blue cell tumors (four classes). The proposed method provides a very small number of genes compared with the previous methods without loss of discriminating power and thus it can effectively facilitate further biological and clinical researches.
在这项工作中,我们提出了一种新方法,用于寻找能够有效区分疾病亚型的微阵列数据基因子集。我们通过使用从每个样本到类质心的距离的均值和标准差,开发了一种衡量单个基因相关性的新标准,以解决基因选择中众所周知的问题——类内差异大。而且这种方法的优点是不仅适用于二分类,还适用于多分类问题。我们通过将该方法应用于公开可用的微阵列数据集——白血病(两类)和小圆蓝细胞肿瘤(四类),展示了该方法的性能。与先前的方法相比,所提出的方法在不损失区分能力的情况下提供了非常少的基因,因此它可以有效地促进进一步的生物学和临床研究。