WISE Lab., Division of Information and Computer Engineering, Ajou University, Suwon, Kyeonggi 443-749, Korea.
IEEE/ACM Trans Comput Biol Bioinform. 2011 Mar-Apr;8(2):316-25. doi: 10.1109/TCBB.2010.96.
In biomedical data, the imbalanced data problem occurs frequently and causes poor prediction performance for minority classes. It is because the trained classifiers are mostly derived from the majority class. In this paper, we describe an ensemble learning method combined with active example selection to resolve the imbalanced data problem. Our method consists of three key components: 1) an active example selection algorithm to choose informative examples for training the classifier, 2) an ensemble learning method to combine variations of classifiers derived by active example selection, and 3) an incremental learning scheme to speed up the iterative training procedure for active example selection. We evaluate the method on six real-world imbalanced data sets in biomedical domains, showing that the proposed method outperforms both the random under sampling and the ensemble with under sampling methods. Compared to other approaches to solving the imbalanced data problem, our method excels by 0.03-0.15 points in AUC measure.
在生物医学数据中,不平衡数据问题经常出现,导致少数类别的预测性能较差。这是因为训练有素的分类器主要来自多数类。在本文中,我们描述了一种结合主动示例选择的集成学习方法来解决不平衡数据问题。我们的方法由三个关键组件组成:1)主动示例选择算法,用于选择有信息的示例来训练分类器;2)集成学习方法,用于结合由主动示例选择生成的分类器的变化;3)增量学习方案,用于加速主动示例选择的迭代训练过程。我们在六个生物医学领域的真实不平衡数据集上评估了该方法,表明所提出的方法优于随机欠采样和集成欠采样方法。与其他解决不平衡数据问题的方法相比,我们的方法在 AUC 度量上高出 0.03-0.15 分。