• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

分类器集成方法解决缺失特征问题。

A classifier ensemble approach for the missing feature problem.

机构信息

Department of Information Engineering, University of Padua, Via Gradenigo, 6/B, 35131 Padova, Italy.

出版信息

Artif Intell Med. 2012 May;55(1):37-50. doi: 10.1016/j.artmed.2011.11.006. Epub 2011 Dec 20.

DOI:10.1016/j.artmed.2011.11.006
PMID:22188722
Abstract

OBJECTIVES

Many classification problems must deal with data that contains missing values. In such cases data imputation is critical. This paper evaluates the performance of several statistical and machine learning imputation methods, including our novel multiple imputation ensemble approach, using different datasets.

MATERIALS AND METHODS

Several state-of-the-art approaches are compared using different datasets. Some state-of-the-art classifiers (including support vector machines and input decimated ensembles) are tested with several imputation methods. The novel approach proposed in this work is a multiple imputation method based on random subspace, where each missing value is calculated considering a different cluster of the data. We have used a fuzzy clustering approach for the clustering algorithm.

RESULTS

Our experiments have shown that the proposed multiple imputation approach based on clustering and a random subspace classifier outperforms several other state-of-the-art approaches. Using the Wilcoxon signed-rank test (reject the null hypothesis, level of significance 0.05) we have shown that the proposed best approach is outperformed by the classifier trained using the original data (i.e., without missing values) only when >20% of the data are missed. Moreover, we have shown that coupling an imputation method with our cluster based imputation we outperform the base method (level of significance ∼0.05).

CONCLUSION

Starting from the assumptions that the feature set must be partially redundant and that the redundancy is distributed randomly over the feature set, we have proposed a method that works quite well even when a large percentage of the features is missing (≥30%). Our best approach is available (MATLAB code) at bias.csr.unibo.it/nanni/MI.rar.

摘要

目的

许多分类问题都必须处理包含缺失值的数据。在这种情况下,数据插补至关重要。本文使用不同的数据集评估了几种统计和机器学习插补方法的性能,包括我们新颖的多元插补集成方法。

材料和方法

使用不同的数据集比较了几种最先进的方法。使用几种插补方法测试了一些最先进的分类器(包括支持向量机和输入抽取集成)。本文提出的新方法是一种基于随机子空间的多元插补方法,其中每个缺失值的计算都考虑了数据的不同簇。我们使用模糊聚类算法作为聚类算法。

结果

我们的实验表明,基于聚类和随机子空间分类器的提出的多元插补方法优于其他几种最先进的方法。使用 Wilcoxon 符号秩检验(拒绝零假设,显著性水平为 0.05),我们表明,只有当 >20%的数据丢失时,使用原始数据(即没有缺失值)训练的分类器才能超过所提出的最佳方法。此外,我们表明,将插补方法与我们基于聚类的插补方法结合使用,可以优于基础方法(显著性水平约为 0.05)。

结论

基于特征集必须部分冗余且冗余随机分布在特征集的假设,我们提出了一种即使在丢失大量特征(≥30%)的情况下也能很好地工作的方法。我们的最佳方法可在 bias.csr.unibo.it/nanni/MI.rar 获得(MATLAB 代码)。

相似文献

1
A classifier ensemble approach for the missing feature problem.分类器集成方法解决缺失特征问题。
Artif Intell Med. 2012 May;55(1):37-50. doi: 10.1016/j.artmed.2011.11.006. Epub 2011 Dec 20.
2
Handling missing values in support vector machine classifiers.支持向量机分类器中缺失值的处理
Neural Netw. 2005 Jun-Jul;18(5-6):684-92. doi: 10.1016/j.neunet.2005.06.025.
3
Wavelet images and Chou's pseudo amino acid composition for protein classification.小波图像和 Chou 的伪氨基酸组成用于蛋白质分类。
Amino Acids. 2012 Aug;43(2):657-65. doi: 10.1007/s00726-011-1114-9. Epub 2011 Oct 13.
4
Rotation forest: A new classifier ensemble method.旋转森林:一种新的分类器集成方法。
IEEE Trans Pattern Anal Mach Intell. 2006 Oct;28(10):1619-30. doi: 10.1109/TPAMI.2006.211.
5
Robust imputation method for missing values in microarray data.微阵列数据中缺失值的稳健插补方法。
BMC Bioinformatics. 2007 May 3;8 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-8-S2-S6.
6
Recovering the missing components in a large noisy low-rank matrix: application to SFM.恢复大型含噪低秩矩阵中的缺失分量:在结构从运动中的应用
IEEE Trans Pattern Anal Mach Intell. 2004 Aug;26(8):1051-63. doi: 10.1109/TPAMI.2004.52.
7
LESS: a model-based classifier for sparse subspaces.LESS:一种基于模型的稀疏子空间分类器。
IEEE Trans Pattern Anal Mach Intell. 2005 Sep;27(9):1496-500. doi: 10.1109/TPAMI.2005.182.
8
Towards clustering of incomplete microarray data without the use of imputation.迈向无需插补的不完整微阵列数据聚类
Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31.
9
Bias in error estimation when using cross-validation for model selection.在使用交叉验证进行模型选择时误差估计中的偏差。
BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.
10
Random subspace ensembles for FMRI classification.随机子空间集成方法在 fMRI 分类中的应用。
IEEE Trans Med Imaging. 2010 Feb;29(2):531-42. doi: 10.1109/TMI.2009.2037756.

引用本文的文献

1
Improving Outcome Predictions for Patients Receiving Mechanical Circulatory Support by Optimizing Imputation of Missing Values.通过优化缺失值的插补来提高接受机械循环支持的患者的预后预测。
Circ Cardiovasc Qual Outcomes. 2021 Sep;14(9):e007071. doi: 10.1161/CIRCOUTCOMES.120.007071. Epub 2021 Sep 14.
2
Maximizing the reusability of gene expression data by predicting missing metadata.通过预测缺失的元数据来最大化基因表达数据的可重用性。
PLoS Comput Biol. 2020 Nov 6;16(11):e1007450. doi: 10.1371/journal.pcbi.1007450. eCollection 2020 Nov.
3
A Long Short-Term Memory Ensemble Approach for Improving the Outcome Prediction in Intensive Care Unit.
基于长短时记忆网络集成的方法改善重症监护病房患者预后预测
Comput Math Methods Med. 2019 Nov 3;2019:8152713. doi: 10.1155/2019/8152713. eCollection 2019.
4
mvp - an open-source preprocessor for cleaning duplicate records and missing values in mass spectrometry data.MVP - 一款用于清理质谱数据中重复记录和缺失值的开源预处理器。
FEBS Open Bio. 2017 Jun 19;7(7):1051-1059. doi: 10.1002/2211-5463.12247. eCollection 2017 Jul.
5
Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values.用于对存在缺失值的医疗保健数据进行分类的多级加权支持向量机
PLoS One. 2016 May 19;11(5):e0155119. doi: 10.1371/journal.pone.0155119. eCollection 2016.
6
Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data.结合傅里叶变换和滞后k近邻插补法处理生物医学时间序列数据
J Biomed Inform. 2015 Dec;58:198-207. doi: 10.1016/j.jbi.2015.10.004. Epub 2015 Oct 21.
7
Data Imputation in Epistatic MAPs by Network-Guided Matrix Completion.通过网络引导的矩阵补全在上位性全基因组关联图谱中的数据插补
J Comput Biol. 2015 Jun;22(6):595-608. doi: 10.1089/cmb.2014.0158. Epub 2015 Feb 6.
8
Zheng classification with missing feature values using local-validity approach.基于局部有效性方法的缺失特征值 Zheng 分类。
Evid Based Complement Alternat Med. 2013;2013:493626. doi: 10.1155/2013/493626. Epub 2013 Dec 23.