Joseph Sandeep J, Robbins Kelly R, Zhang Wensheng, Rekaya Romdhane
Rhodes Centre for Animal and Dairy Science, University of Georgia, Athens, GA 30605, USA.
Cancer Inform. 2010 Mar 10;9:39-48. doi: 10.4137/cin.s3827.
Multi-class cancer classification based on microarray data is described. A generalized output-coding scheme based on One Versus One (OVO) combined with Latent Variable Model (LVM) is used. Results from the proposed One Versus One (OVO) outputcoding strategy is compared with the results obtained from the generalized One Versus All (OVA) method and their efficiencies of using them for multi-class tumor classification have been studied. This comparative study was done using two microarray gene expression data: Global Cancer Map (GCM) dataset and brain cancer (BC) dataset. Primary feature selection was based on fold change and penalized t-statistics. Evaluation was conducted with varying feature numbers. The OVO coding strategy worked quite well with the BC data, while both OVO and OVA results seemed to be similar for the GCM data. The selection of output coding methods for combining binary classifiers for multi-class tumor classification depends on the number of tumor types considered, the discrepancies between the tumor samples used for training as well as the heterogeneity of expression within the cancer subtypes used as training data.
本文描述了基于微阵列数据的多类别癌症分类方法。使用了一种基于一对多(OVO)并结合潜在变量模型(LVM)的广义输出编码方案。将所提出的一对多(OVO)输出编码策略的结果与从广义的一对一(OVA)方法获得的结果进行了比较,并研究了它们在多类别肿瘤分类中的使用效率。这项比较研究使用了两个微阵列基因表达数据集:全球癌症图谱(GCM)数据集和脑癌(BC)数据集。主要特征选择基于倍数变化和惩罚t统计量。评估在不同特征数量下进行。OVO编码策略在BC数据上表现良好,而对于GCM数据,OVO和OVA的结果似乎相似。为多类别肿瘤分类组合二元分类器时输出编码方法的选择取决于所考虑的肿瘤类型数量、用于训练的肿瘤样本之间的差异以及用作训练数据的癌症亚型内表达的异质性。