College of Management and Economics, Tianjin University, Tianjin, 300072, China.
Business School, Nankai University, Tianjin, 300071, China.
Comput Methods Programs Biomed. 2021 Nov;211:106444. doi: 10.1016/j.cmpb.2021.106444. Epub 2021 Sep 29.
As blood testing is radiation-free, low-cost and simple to operate, some researchers use machine learning to detect COVID-19 from blood test data. However, few studies take into consideration the imbalanced data distribution, which can impair the performance of a classifier.
A novel combined dynamic ensemble selection (DES) method is proposed for imbalanced data to detect COVID-19 from complete blood count. This method combines data preprocessing and improved DES. Firstly, we use the hybrid synthetic minority over-sampling technique and edited nearest neighbor (SMOTE-ENN) to balance data and remove noise. Secondly, in order to improve the performance of DES, a novel hybrid multiple clustering and bagging classifier generation (HMCBCG) method is proposed to reinforce the diversity and local regional competence of candidate classifiers.
The experimental results based on three popular DES methods show that the performance of HMCBCG is better than only use bagging. HMCBCG+KNE obtains the best performance for COVID-19 screening with 99.81% accuracy, 99.86% F1, 99.78% G-mean and 99.81% AUC.
Compared to other advanced methods, our combined DES model can improve accuracy, G-mean, F1 and AUC of COVID-19 screening.
由于血液检测无辐射、成本低且操作简单,一些研究人员利用机器学习从血液检测数据中检测 COVID-19。然而,很少有研究考虑到数据分布不平衡的问题,这可能会影响分类器的性能。
针对不平衡数据,我们提出了一种新的组合动态集成选择 (DES) 方法,用于从全血细胞计数中检测 COVID-19。该方法结合了数据预处理和改进的 DES。首先,我们使用混合合成少数过采样技术和编辑最近邻 (SMOTE-ENN) 来平衡数据并消除噪声。其次,为了提高 DES 的性能,我们提出了一种新的混合多聚类和袋装分类器生成 (HMCBCG) 方法,以增强候选分类器的多样性和局部区域竞争力。
基于三种流行的 DES 方法的实验结果表明,HMCBCG 的性能优于仅使用袋装。HMCBCG+KNE 对 COVID-19 筛查的性能最佳,准确率为 99.81%,F1 值为 99.86%,G-mean 为 99.78%,AUC 为 99.81%。
与其他先进方法相比,我们的组合 DES 模型可以提高 COVID-19 筛查的准确率、G-mean、F1 和 AUC。