Wang Junbai, Bø Trond Hellem, Jonassen Inge, Myklebost Ola, Hovig Eivind
Department of Tumor Biology, The Norwegian Radium Hospital, N0310 Oslo, Norway.
BMC Bioinformatics. 2003 Dec 2;4:60. doi: 10.1186/1471-2105-4-60.
Using DNA microarrays, we have developed two novel models for tumor classification and target gene prediction. First, gene expression profiles are summarized by optimally selected Self-Organizing Maps (SOMs), followed by tumor sample classification by Fuzzy C-means clustering. Then, the prediction of marker genes is accomplished by either manual feature selection (visualizing the weighted/mean SOM component plane) or automatic feature selection (by pair-wise Fisher's linear discriminant).
The proposed models were tested on four published datasets: (1) Leukemia (2) Colon cancer (3) Brain tumors and (4) NCI cancer cell lines. The models gave class prediction with markedly reduced error rates compared to other class prediction approaches, and the importance of feature selection on microarray data analysis was also emphasized.
Our models identify marker genes with predictive potential, often better than other available methods in the literature. The models are potentially useful for medical diagnostics and may reveal some insights into cancer classification. Additionally, we illustrated two limitations in tumor classification from microarray data related to the biology underlying the data, in terms of (1) the class size of data, and (2) the internal structure of classes. These limitations are not specific for the classification models used.
我们利用DNA微阵列开发了两种用于肿瘤分类和靶基因预测的新模型。首先,通过最优选择的自组织映射(SOM)总结基因表达谱,然后通过模糊C均值聚类对肿瘤样本进行分类。接着,通过手动特征选择(可视化加权/平均SOM分量平面)或自动特征选择(通过成对的Fisher线性判别)来完成标记基因的预测。
所提出的模型在四个已发表的数据集上进行了测试:(1)白血病(2)结肠癌(3)脑肿瘤和(4)NCI癌细胞系。与其他分类预测方法相比,这些模型给出的分类预测错误率显著降低,并且还强调了特征选择在微阵列数据分析中的重要性。
我们的模型能够识别具有预测潜力的标记基因,通常比文献中其他可用方法更好。这些模型在医学诊断中可能有用,并且可能揭示一些关于癌症分类的见解。此外,我们从微阵列数据相关的生物学角度说明了肿瘤分类中的两个局限性,即(1)数据的类别大小,以及(2)类别的内部结构。这些局限性并非所使用的分类模型所特有。