• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于最大信息系数和 Gram-Schmidt 正交化的生物医学数据挖掘过滤特征选择方法。

A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining.

机构信息

School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, PR China; School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, PR China.

School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, PR China.

出版信息

Comput Biol Med. 2017 Oct 1;89:264-274. doi: 10.1016/j.compbiomed.2017.08.021. Epub 2017 Aug 24.

DOI:10.1016/j.compbiomed.2017.08.021
PMID:28850898
Abstract

A filter feature selection technique has been widely used to mine biomedical data. Recently, in the classical filter method minimal-Redundancy-Maximal-Relevance (mRMR), a risk has been revealed that a specific part of the redundancy, called irrelevant redundancy, may be involved in the minimal-redundancy component of this method. Thus, a few attempts to eliminate the irrelevant redundancy by attaching additional procedures to mRMR, such as Kernel Canonical Correlation Analysis based mRMR (KCCAmRMR), have been made. In the present study, a novel filter feature selection method based on the Maximal Information Coefficient (MIC) and Gram-Schmidt Orthogonalization (GSO), named Orthogonal MIC Feature Selection (OMICFS), was proposed to solve this problem. Different from other improved approaches under the max-relevance and min-redundancy criterion, in the proposed method, the MIC is used to quantify the degree of relevance between feature variables and target variable, the GSO is devoted to calculating the orthogonalized variable of a candidate feature with respect to previously selected features, and the max-relevance and min-redundancy can be indirectly optimized by maximizing the MIC relevance between the GSO orthogonalized variable and target. This orthogonalization strategy allows OMICFS to exclude the irrelevant redundancy without any additional procedures. To verify the performance, OMICFS was compared with other filter feature selection methods in terms of both classification accuracy and computational efficiency by conducting classification experiments on two types of biomedical datasets. The results showed that OMICFS outperforms the other methods in most cases. In addition, differences between these methods were analyzed, and the application of OMICFS in the mining of high-dimensional biomedical data was discussed. The Matlab code for the proposed method is available at https://github.com/lhqxinghun/bioinformatics/tree/master/OMICFS/.

摘要

一种过滤特征选择技术已被广泛用于挖掘生物医学数据。最近,在经典的过滤方法最小冗余最大相关性(mRMR)中,已经发现了一个风险,即冗余的一个特定部分,称为不相关冗余,可能涉及该方法的最小冗余部分。因此,已经尝试通过向 mRMR 添加附加程序来消除不相关冗余,例如基于核典型相关分析的 mRMR(KCCAmRMR)。在本研究中,提出了一种基于最大信息系数(MIC)和 Gram-Schmidt 正交化(GSO)的新型过滤特征选择方法,称为正交 MIC 特征选择(OMICFS),以解决这个问题。与最大相关性和最小冗余准则下的其他改进方法不同,在所提出的方法中,MIC 用于量化特征变量与目标变量之间的相关性程度,GSO 用于计算候选特征相对于先前选择的特征的正交化变量,并且可以通过最大化 GSO 正交化变量与目标之间的 MIC 相关性来间接优化最大相关性和最小冗余性。这种正交化策略允许 OMICFS 在不使用任何附加程序的情况下排除不相关的冗余。为了验证性能,通过在两种类型的生物医学数据集上进行分类实验,将 OMICFS 与其他过滤特征选择方法在分类准确性和计算效率方面进行了比较。结果表明,在大多数情况下,OMICFS 优于其他方法。此外,还分析了这些方法之间的差异,并讨论了 OMICFS 在挖掘高维生物医学数据中的应用。该方法的 Matlab 代码可在 https://github.com/lhqxinghun/bioinformatics/tree/master/OMICFS/ 获得。

相似文献

1
A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining.基于最大信息系数和 Gram-Schmidt 正交化的生物医学数据挖掘过滤特征选择方法。
Comput Biol Med. 2017 Oct 1;89:264-274. doi: 10.1016/j.compbiomed.2017.08.021. Epub 2017 Aug 24.
2
A new improved maximal relevance and minimal redundancy method based on feature subset.一种基于特征子集的新的改进的最大相关性和最小冗余方法。
J Supercomput. 2023;79(3):3157-3180. doi: 10.1007/s11227-022-04763-2. Epub 2022 Aug 30.
3
Semisupervised Feature Selection Based on Relevance and Redundancy Criteria.基于相关性和冗余性准则的半监督特征选择。
IEEE Trans Neural Netw Learn Syst. 2017 Sep;28(9):1974-1984. doi: 10.1109/TNNLS.2016.2562670. Epub 2016 May 20.
4
FSCME: A Feature Selection Method Combining Copula Correlation and Maximal Information Coefficient by Entropy Weights.FSCME:一种基于熵权的结合 Copula 相关性和最大信息系数的特征选择方法。
IEEE J Biomed Health Inform. 2024 Sep;28(9):5638-5648. doi: 10.1109/JBHI.2024.3409628. Epub 2024 Sep 5.
5
Information-theoretic approaches to SVM feature selection for metagenome read classification.基于信息论的支持向量机特征选择方法在宏基因组读分类中的应用。
Comput Biol Chem. 2011 Jun;35(3):199-209. doi: 10.1016/j.compbiolchem.2011.04.007. Epub 2011 May 13.
6
Supervised Relevance-Redundancy assessments for feature selection in omics-based classification scenarios.基于组学的分类场景中特征选择的有监督相关性-冗余评估。
J Biomed Inform. 2023 Aug;144:104457. doi: 10.1016/j.jbi.2023.104457. Epub 2023 Jul 23.
7
Classification of high dimensional biomedical data based on feature selection using redundant removal.基于冗余消除的特征选择对高维生物医学数据的分类。
PLoS One. 2019 Apr 9;14(4):e0214406. doi: 10.1371/journal.pone.0214406. eCollection 2019.
8
Wavelength selection method for near-infrared spectroscopy based on Max-Relevance Min-Redundancy.基于最大相关最小冗余的近红外光谱波长选择方法
Spectrochim Acta A Mol Biomol Spectrosc. 2024 Apr 5;310:123933. doi: 10.1016/j.saa.2024.123933. Epub 2024 Jan 22.
9
Kernel Partial Least Squares Feature Selection Based on Maximum Weight Minimum Redundancy.基于最大权重最小冗余的核偏最小二乘特征选择
Entropy (Basel). 2023 Feb 10;25(2):325. doi: 10.3390/e25020325.
10
Chi-MIC-share: a new feature selection algorithm for quantitative structure-activity relationship models.Chi-MIC-share:一种用于定量构效关系模型的新特征选择算法。
RSC Adv. 2020 May 27;10(34):19852-19860. doi: 10.1039/d0ra00061b. eCollection 2020 May 26.

引用本文的文献

1
The structure is the message: Preserving experimental context through tensor decomposition.结构即信息:通过张量分解保存实验背景。
Cell Syst. 2024 Aug 21;15(8):679-693. doi: 10.1016/j.cels.2024.07.004.
2
Robust classification of heart valve sound based on adaptive EMD and feature fusion.基于自适应 EMD 和特征融合的稳健心音分类。
PLoS One. 2022 Dec 8;17(12):e0276264. doi: 10.1371/journal.pone.0276264. eCollection 2022.
3
A Feature Selection Algorithm Integrating Maximum Classification Information and Minimum Interaction Feature Dependency Information.
一种集成最大分类信息和最小交互特征依赖信息的特征选择算法。
Comput Intell Neurosci. 2021 Dec 28;2021:3569632. doi: 10.1155/2021/3569632. eCollection 2021.
4
HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis.HFS-SLPEE:一种用于精准癌症诊断的新型分层特征选择与二次学习概率误差集成模型
Front Cell Dev Biol. 2021 Jun 30;9:696359. doi: 10.3389/fcell.2021.696359. eCollection 2021.
5
An improved algorithm for the maximal information coefficient and its application.一种改进的最大信息系数算法及其应用。
R Soc Open Sci. 2021 Feb 10;8(2):201424. doi: 10.1098/rsos.201424.
6
A Neighborhood Rough Sets-Based Attribute Reduction Method Using Lebesgue and Entropy Measures.一种基于邻域粗糙集的使用勒贝格测度和熵测度的属性约简方法。
Entropy (Basel). 2019 Feb 1;21(2):138. doi: 10.3390/e21020138.
7
Detection and Comparative Analysis of Methylomic Biomarkers of Rheumatoid Arthritis.类风湿关节炎甲基化生物标志物的检测与比较分析
Front Genet. 2020 Mar 27;11:238. doi: 10.3389/fgene.2020.00238. eCollection 2020.
8
Sparse support vector machines with L approximation for ultra-high dimensional omics data.具有 L 逼近的稀疏支持向量机用于超高维组学数据。
Artif Intell Med. 2019 May;96:134-141. doi: 10.1016/j.artmed.2019.04.004. Epub 2019 Apr 30.
9
Age Is Important for the Early-Stage Detection of Breast Cancer on Both Transcriptomic and Methylomic Biomarkers.年龄对于基于转录组学和甲基组学生物标志物的乳腺癌早期检测至关重要。
Front Genet. 2019 Mar 26;10:212. doi: 10.3389/fgene.2019.00212. eCollection 2019.