• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MinE-RFE:通过最小化子集精度定义的能量来确定 RFE 中的最优子集。

MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy-defined energy.

机构信息

School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin, China.

School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.

出版信息

Brief Bioinform. 2020 Mar 23;21(2):687-698. doi: 10.1093/bib/bbz021.

DOI:10.1093/bib/bbz021
PMID:30860571
Abstract

Recursive feature elimination (RFE), as one of the most popular feature selection algorithms, has been extensively applied to bioinformatics. During the training, a group of candidate subsets are generated by iteratively eliminating the least important features from the original features. However, how to determine the optimal subset from them still remains ambiguous. Among most current studies, either overall accuracy or subset size (SS) is used to select the most predictive features. Using which one or both and how they affect the prediction performance are still open questions. In this study, we proposed MinE-RFE, a novel RFE-based feature selection approach by sufficiently considering the effect of both factors. Subset decision problem was reflected into subset-accuracy space and became an energy-minimization problem. We also provided a mathematical description of the relationship between the overall accuracy and SS using Gaussian Mixture Models together with spline fitting. Besides, we comprehensively reviewed a variety of state-of-the-art applications in bioinformatics using RFE. We compared their approaches of deciding the final subset from all the candidate subsets with MinE-RFE on diverse bioinformatics data sets. Additionally, we also compared MinE-RFE with some well-used feature selection algorithms. The comparative results demonstrate that the proposed approach exhibits the best performance among all the approaches. To facilitate the use of MinE-RFE, we further established a user-friendly web server with the implementation of the proposed approach, which is accessible at http://qgking.wicp.net/MinE/. We expect this web server will be a useful tool for research community.

摘要

递归特征消除(RFE)作为最流行的特征选择算法之一,已被广泛应用于生物信息学领域。在训练过程中,通过从原始特征中迭代地删除最不重要的特征,生成一组候选子集。然而,如何从它们中确定最佳子集仍然不明确。在大多数当前的研究中,要么使用整体准确性,要么使用子集大小(SS)来选择最具预测性的特征。使用哪一个或两者以及它们如何影响预测性能仍然是悬而未决的问题。在这项研究中,我们提出了 MinE-RFE,这是一种基于 RFE 的新型特征选择方法,充分考虑了这两个因素的影响。子集决策问题反映在子集-准确性空间中,并成为一个能量最小化问题。我们还使用高斯混合模型和样条拟合提供了整体准确性和 SS 之间关系的数学描述。此外,我们还全面回顾了生物信息学中使用 RFE 的各种最新应用。我们将 MinE-RFE 与从所有候选子集中最终确定子集的各种方法在不同的生物信息学数据集上进行了比较。此外,我们还将 MinE-RFE 与一些常用的特征选择算法进行了比较。比较结果表明,所提出的方法在所有方法中表现出最好的性能。为了方便使用 MinE-RFE,我们进一步建立了一个用户友好的网络服务器,实现了所提出的方法,该服务器可在 http://qgking.wicp.net/MinE/ 访问。我们希望这个网络服务器将成为研究社区的一个有用工具。

相似文献

1
MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy-defined energy.MinE-RFE:通过最小化子集精度定义的能量来确定 RFE 中的最优子集。
Brief Bioinform. 2020 Mar 23;21(2):687-698. doi: 10.1093/bib/bbz021.
2
Ensemble Feature Learning of Genomic Data Using Support Vector Machine.使用支持向量机的基因组数据集成特征学习
PLoS One. 2016 Jun 15;11(6):e0157330. doi: 10.1371/journal.pone.0157330. eCollection 2016.
3
Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE.用于在随机森林-递归特征消除中自动确定最优特征子集的决策变体
Genes (Basel). 2018 Jun 15;9(6):301. doi: 10.3390/genes9060301.
4
Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics.基于 SVM-RFE 和重叠率选择特征子集及其在生物信息学中的应用。
Molecules. 2017 Dec 26;23(1):52. doi: 10.3390/molecules23010052.
5
An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data.基于基因表达数据的多支持向量机技术的高效特征选择策略。
Biomed Res Int. 2018 Aug 30;2018:7538204. doi: 10.1155/2018/7538204. eCollection 2018.
6
An efficient model selection for linear discriminant function-based recursive feature elimination.基于线性判别函数的递归特征消除的有效模型选择。
J Biomed Inform. 2022 May;129:104070. doi: 10.1016/j.jbi.2022.104070. Epub 2022 Apr 15.
7
Effective hybrid feature selection using different bootstrap enhances cancers classification performance.使用不同的自助法进行有效的混合特征选择可提高癌症分类性能。
BioData Min. 2022 Sep 30;15(1):24. doi: 10.1186/s13040-022-00304-y.
8
Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods.基于稳健机器学习-递归特征消除方法的基因表达数据的稳健生物标志物筛选。
Comput Biol Chem. 2022 Oct;100:107747. doi: 10.1016/j.compbiolchem.2022.107747. Epub 2022 Jul 29.
9
Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.基于最大间隔准则的递归基因选择:与支持向量机递归特征消除法的比较
BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.
10
Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis.用于微阵列表达数据分析的两阶段支持向量机-递归特征消除基因选择策略的开发。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):365-81. doi: 10.1109/TCBB.2007.70224.

引用本文的文献

1
Predicting podoplanin expression and prognostic significance in high-grade glioma based on TCGA TCIA radiomics.基于TCGA和TCIA放射组学预测高级别胶质瘤中血小板源性生长因子受体配体的表达及预后意义
PLoS One. 2025 Jun 24;20(6):e0325964. doi: 10.1371/journal.pone.0325964. eCollection 2025.
2
Advancing the development of deep learning and machine learning models for oral drugs through diverse descriptor classes: a focus on pharmacokinetic parameters (Vdss and PPB).通过多种描述符类别推进口服药物深度学习和机器学习模型的开发:聚焦药代动力学参数(稳态分布容积和血浆蛋白结合率)
Mol Divers. 2025 Jun 11. doi: 10.1007/s11030-025-11235-1.
3
NETosis Genes and Pathomic Signature: A Novel Prognostic Marker for Ovarian Serous Cystadenocarcinoma.
中性粒细胞胞外诱捕网形成相关基因与病理特征:卵巢浆液性囊腺癌的一种新型预后标志物
J Imaging Inform Med. 2024 Dec 11. doi: 10.1007/s10278-024-01366-6.
4
A novel MissForest-based missing values imputation approach with recursive feature elimination in medical applications.一种基于 MissForest 的新的缺失值插补方法,在医学应用中采用递归特征消除。
BMC Med Res Methodol. 2024 Nov 8;24(1):269. doi: 10.1186/s12874-024-02392-2.
5
Single-cell transcriptome analysis reveals immune microenvironment changes and insights into the transition from DCIS to IDC with associated prognostic genes.单细胞转录组分析揭示了免疫微环境的变化,并深入了解了从 DCIS 到 IDC 的转变,以及与预后相关的基因。
J Transl Med. 2024 Oct 3;22(1):894. doi: 10.1186/s12967-024-05706-6.
6
Using machine learning to improve anaphylaxis case identification in medical claims data.利用机器学习改进医疗理赔数据中的过敏反应病例识别。
JAMIA Open. 2024 Jun 21;7(2):ooae037. doi: 10.1093/jamiaopen/ooae037. eCollection 2024 Jul.
7
Bitter-RF: A random forest machine model for recognizing bitter peptides.苦味-RF:一种用于识别苦味肽的随机森林机器学习模型。
Front Med (Lausanne). 2023 Jan 26;10:1052923. doi: 10.3389/fmed.2023.1052923. eCollection 2023.
8
Radiation Type- and Dose-Specific Transcriptional Responses across Healthy and Diseased Mammalian Tissues.健康和患病哺乳动物组织中辐射类型及剂量特异性转录反应
Antioxidants (Basel). 2022 Nov 18;11(11):2286. doi: 10.3390/antiox11112286.
9
SNAREs-SAP: SNARE Proteins Identification With PSSM Profiles.SNAREs-SAP:利用位置特异性得分矩阵(PSSM)谱识别SNARE蛋白
Front Genet. 2021 Dec 20;12:809001. doi: 10.3389/fgene.2021.809001. eCollection 2021.
10
Pretraining model for biological sequence data.生物序列数据的预训练模型。
Brief Funct Genomics. 2021 Jun 9;20(3):181-195. doi: 10.1093/bfgp/elab025.