• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

两类生物医学数据集上特征选择方法的实验比较。

An experimental comparison of feature selection methods on two-class biomedical datasets.

作者信息

Drotár P, Gazda J, Smékal Z

机构信息

Department of Telecommunications, Brno University of Technology, Technická 12, 61200 Brno, Czech Republic.

Department of Computers and Informatics, Technical University of Kosice, Letna 9, 0401 Kosice, Slovakia.

出版信息

Comput Biol Med. 2015 Nov 1;66:1-10. doi: 10.1016/j.compbiomed.2015.08.010. Epub 2015 Aug 24.

DOI:10.1016/j.compbiomed.2015.08.010
PMID:26327447
Abstract

Feature selection is a significant part of many machine learning applications dealing with small-sample and high-dimensional data. Choosing the most important features is an essential step for knowledge discovery in many areas of biomedical informatics. The increased popularity of feature selection methods and their frequent utilisation raise challenging new questions about the interpretability and stability of feature selection techniques. In this study, we compared the behaviour of ten state-of-the-art filter methods for feature selection in terms of their stability, similarity, and influence on prediction performance. All of the experiments were conducted on eight two-class datasets from biomedical areas. While entropy-based feature selection appears to be the most stable, the feature selection techniques yielding the highest prediction performance are minimum redundance maximum relevance method and feature selection based on Bhattacharyya distance. In general, univariate feature selection techniques perform similarly to or even better than more complex multivariate feature selection techniques with high-dimensional datasets. However, with more complex and smaller datasets multivariate methods slightly outperform univariate techniques.

摘要

特征选择是许多处理小样本和高维数据的机器学习应用的重要组成部分。选择最重要的特征是生物医学信息学许多领域中知识发现的关键步骤。特征选择方法的日益普及及其频繁使用引发了关于特征选择技术的可解释性和稳定性的具有挑战性的新问题。在本研究中,我们比较了十种用于特征选择的先进过滤方法在稳定性、相似性以及对预测性能的影响方面的表现。所有实验均在来自生物医学领域的八个二类数据集上进行。虽然基于熵的特征选择似乎是最稳定的,但产生最高预测性能的特征选择技术是最小冗余最大相关方法和基于 Bhattacharyya 距离的特征选择。一般来说,单变量特征选择技术在高维数据集上的表现与更复杂的多变量特征选择技术相似,甚至更好。然而,对于更复杂且更小的数据集,多变量方法略优于单变量技术。

相似文献

1
An experimental comparison of feature selection methods on two-class biomedical datasets.两类生物医学数据集上特征选择方法的实验比较。
Comput Biol Med. 2015 Nov 1;66:1-10. doi: 10.1016/j.compbiomed.2015.08.010. Epub 2015 Aug 24.
2
A novel feature selection approach for biomedical data classification.一种用于生物医学数据分类的新特征选择方法。
J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.
3
Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.基于集成特征选择方法的癌症诊断稳健生物标志物识别。
Bioinformatics. 2010 Feb 1;26(3):392-8. doi: 10.1093/bioinformatics/btp630. Epub 2009 Nov 25.
4
A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets.用于癌症数据集分类的单变量和多变量基因选择技术比较。
BMC Bioinformatics. 2006 May 2;7:235. doi: 10.1186/1471-2105-7-235.
5
A novel class dependent feature selection method for cancer biomarker discovery.一种新的基于类别相关特征选择的癌症生物标志物发现方法。
Comput Biol Med. 2014 Apr;47:66-75. doi: 10.1016/j.compbiomed.2014.01.014. Epub 2014 Feb 6.
6
A Distributed Feature Selection Algorithm Based on Distance Correlation with an Application to Microarrays.基于距离相关的分布式特征选择算法及其在微阵列中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Nov-Dec;16(6):1802-1815. doi: 10.1109/TCBB.2018.2833482. Epub 2018 May 9.
7
Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery.基于化学计量学的特征选择方法在早期癌症检测和生物标志物发现中的稳健性。
Stat Appl Genet Mol Biol. 2013 Mar 13;12(2):207-23. doi: 10.1515/sagmb-2012-0067.
8
A granular computing approach to gene selection.一种用于基因选择的粒度计算方法。
Biomed Mater Eng. 2014;24(1):1307-14. doi: 10.3233/BME-130933.
9
Stable feature selection based on the ensemble L -norm support vector machine for biomarker discovery.基于集成L -范数支持向量机的稳定特征选择用于生物标志物发现。
BMC Genomics. 2016 Dec 22;17(Suppl 13):1026. doi: 10.1186/s12864-016-3320-z.
10
CCFS: A cooperating coevolution technique for large scale feature selection on microarray datasets.CCFS:一种用于微阵列数据集大规模特征选择的协同协同进化技术。
Comput Biol Chem. 2018 Apr;73:171-178. doi: 10.1016/j.compbiolchem.2018.02.006. Epub 2018 Feb 17.

引用本文的文献

1
Predicting land suitability for wheat and barley crops using machine learning techniques.运用机器学习技术预测小麦和大麦作物的土地适宜性。
Sci Rep. 2025 May 7;15(1):15879. doi: 10.1038/s41598-025-99070-0.
2
Texture analysis combined with machine learning in radiographs of the knee joint: potential to identify tibial plateau occult fractures.膝关节X线片纹理分析结合机器学习:识别胫骨平台隐匿性骨折的潜力
Quant Imaging Med Surg. 2025 Jan 2;15(1):502-514. doi: 10.21037/qims-24-799. Epub 2024 Dec 16.
3
Ensemble feature selection and tabular data augmentation with generative adversarial networks to enhance cutaneous melanoma identification and interpretability.
利用生成对抗网络进行集成特征选择和表格数据增强,以提高皮肤黑色素瘤的识别和可解释性。
BioData Min. 2024 Oct 30;17(1):46. doi: 10.1186/s13040-024-00397-7.
4
RHSOFS: Feature Selection Using the Rock Hyrax Swarm Optimization Algorithm for Credit Card Fraud Detection System.RHSOFS:使用岩蹄兔群优化算法进行信用卡欺诈检测系统的特征选择。
Sensors (Basel). 2022 Nov 30;22(23):9321. doi: 10.3390/s22239321.
5
Data Integration-Possibilities of Molecular and Clinical Data Fusion on the Example of Thyroid Cancer Diagnostics.数据集成——以甲状腺癌诊断为例的分子与临床数据融合的可能性。
Int J Mol Sci. 2022 Oct 6;23(19):11880. doi: 10.3390/ijms231911880.
6
LassoNet: Neural Networks with Feature Sparsity.套索网络:具有特征稀疏性的神经网络。
Proc Mach Learn Res. 2021 Apr;130:10-18.
7
Mapping the Corn Residue-Covered Types Using Multi-Scale Feature Fusion and Supervised Learning Method by Chinese GF-2 PMS Image.基于中国高分二号PMS影像利用多尺度特征融合与监督学习方法绘制玉米秸秆覆盖类型图
Front Plant Sci. 2022 Jun 21;13:901042. doi: 10.3389/fpls.2022.901042. eCollection 2022.
8
Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study.高维不平衡数据的成本敏感学习策略:一项比较研究。
PeerJ Comput Sci. 2021 Dec 24;7:e832. doi: 10.7717/peerj-cs.832. eCollection 2021.
9
Determination of biomarkers from microarray data using graph neural network and spectral clustering.基于图神经网络和谱聚类的基因表达谱数据中生物标志物的确定。
Sci Rep. 2021 Dec 13;11(1):23828. doi: 10.1038/s41598-021-03316-6.
10
Exploration of Potential miRNA Biomarkers and Prediction for Ovarian Cancer Using Artificial Intelligence.利用人工智能探索卵巢癌潜在的微小RNA生物标志物及预测
Front Genet. 2021 Nov 25;12:724785. doi: 10.3389/fgene.2021.724785. eCollection 2021.