School of Computer Science and Engineering, Jiangsu University of Science and Technology, No. 2 Mengxi Road, Zhenjiang 212003, China.
Biomed Res Int. 2013;2013:239628. doi: 10.1155/2013/239628. Epub 2013 Aug 26.
DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.
DNA 微阵列技术可以同时测量数以万计的基因的活性,这为在分子水平上诊断癌症提供了一种有效的方法。尽管这种策略引起了广泛的研究关注,但大多数研究都忽略了一个重要问题,即大多数 DNA 微阵列数据集都是偏态的,这导致传统的学习算法产生不准确的结果。一些研究已经考虑到了这个问题,但它们仅仅关注于二分类问题。在本文中,我们通过使用集成学习来处理癌症 DNA 微阵列中遇到的多类不平衡分类问题。我们利用一对一编码策略将多类转化为多个二进制类,每个二进制类都进行特征子空间,这是随机子空间的一个演进版本,它生成多个不同的训练子集。接下来,我们将两种不同的校正技术之一,即决策阈值调整或随机欠采样,引入到每个训练子集中,以减轻类不平衡的影响。具体来说,支持向量机被用作基础分类器,并提出了一种新的投票规则称为反投票,用于做出最终决策。在八个偏态多类癌症微阵列数据集上的实验结果表明,与许多传统的分类方法不同,我们的方法对类不平衡不敏感。