School of Medical Information, Wannan Medical College, Wuhu 241002, China.
School of Computer and Information, Anhui Normal University, Wuhu 241002, China.
Comput Math Methods Med. 2021 Apr 24;2021:5556992. doi: 10.1155/2021/5556992. eCollection 2021.
Ensemble learning combines multiple learners to perform combinatorial learning, which has advantages of good flexibility and higher generalization performance. To achieve higher quality cancer classification, in this study, the fast correlation-based feature selection (FCBF) method was used to preprocess the data to eliminate irrelevant and redundant features. Then, the classification was carried out in the stacking ensemble learner. A library for support vector machine (LIBSVM), -nearest neighbor (KNN), decision tree C4.5 (C4.5), and random forest (RF) were used as the primary learners of the stacking ensemble. Given the imbalanced characteristics of cancer gene expression data, the embedding cost-sensitive naive Bayes was used as the metalearner of the stacking ensemble, which was represented as CSNB stacking. The proposed CSNB stacking method was applied to nine cancer datasets to further verify the classification performance of the model. Compared with other classification methods, such as single classifier algorithms and ensemble algorithms, the experimental results showed the effectiveness and robustness of the proposed method in processing different types of cancer data. This method may therefore help guide cancer diagnosis and research.
集成学习将多个学习者结合起来进行组合学习,具有良好的灵活性和更高的泛化性能的优点。为了实现更高质量的癌症分类,在本研究中,使用快速基于相关的特征选择(FCBF)方法对数据进行预处理,以消除不相关和冗余的特征。然后,在堆叠集成学习者中进行分类。支持向量机(LIBSVM)、-最近邻(KNN)、决策树 C4.5(C4.5)和随机森林(RF)等库被用作堆叠集成的基本学习者。考虑到癌症基因表达数据的不平衡特点,使用嵌入式成本敏感朴素贝叶斯作为堆叠集成的元学习者,称为 CSNB 堆叠。将所提出的 CSNB 堆叠方法应用于九个癌症数据集,以进一步验证模型的分类性能。与其他分类方法,如单分类器算法和集成算法相比,实验结果表明,该方法在处理不同类型的癌症数据方面具有有效性和鲁棒性。因此,该方法可能有助于指导癌症诊断和研究。