SVM-RFE：通过非线性核选择和可视化最相关特征。

SVM-RFE: selection and visualization of the most relevant features through non-linear kernels.

机构信息

Department of Genetics, Microbiology and Statistics, Faculty of Biology, Universitat de Barcelona, Diagonal, 643, 08028, Barcelona, Catalonia, Spain.

Department of Osteopathic Medical Specialties, Michigan State University, 909 Fee Road, Room B 309 West Fee Hall, East Lansing, MI, 48824, USA.

出版信息

BMC Bioinformatics. 2018 Nov 19;19(1):432. doi: 10.1186/s12859-018-2451-4.

DOI:10.1186/s12859-018-2451-4

PMID:30453885

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6245920/

Abstract

BACKGROUND

Support vector machines (SVM) are a powerful tool to analyze data with a number of predictors approximately equal or larger than the number of observations. However, originally, application of SVM to analyze biomedical data was limited because SVM was not designed to evaluate importance of predictor variables. Creating predictor models based on only the most relevant variables is essential in biomedical research. Currently, substantial work has been done to allow assessment of variable importance in SVM models but this work has focused on SVM implemented with linear kernels. The power of SVM as a prediction model is associated with the flexibility generated by use of non-linear kernels. Moreover, SVM has been extended to model survival outcomes. This paper extends the Recursive Feature Elimination (RFE) algorithm by proposing three approaches to rank variables based on non-linear SVM and SVM for survival analysis.

RESULTS

The proposed algorithms allows visualization of each one the RFE iterations, and hence, identification of the most relevant predictors of the response variable. Using simulation studies based on time-to-event outcomes and three real datasets, we evaluate the three methods, based on pseudo-samples and kernel principal component analysis, and compare them with the original SVM-RFE algorithm for non-linear kernels. The three algorithms we proposed performed generally better than the gold standard RFE for non-linear kernels, when comparing the truly most relevant variables with the variable ranks produced by each algorithm in simulation studies. Generally, the RFE-pseudo-samples outperformed the other three methods, even when variables were assumed to be correlated in all tested scenarios.

CONCLUSIONS

The proposed approaches can be implemented with accuracy to select variables and assess direction and strength of associations in analysis of biomedical data using SVM for categorical or time-to-event responses. Conducting variable selection and interpreting direction and strength of associations between predictors and outcomes with the proposed approaches, particularly with the RFE-pseudo-samples approach can be implemented with accuracy when analyzing biomedical data. These approaches, perform better than the classical RFE of Guyon for realistic scenarios about the structure of biomedical data.

摘要

背景

支持向量机（SVM）是一种强大的工具，可用于分析具有与观测值数量大致相等或更多预测变量的数据。然而，最初，SVM 应用于生物医学数据分析受到限制，因为 SVM 不是为评估预测变量的重要性而设计的。在生物医学研究中，基于最相关的变量创建预测模型是至关重要的。目前，已经做了大量工作来允许在 SVM 模型中评估变量的重要性，但这项工作主要集中在具有线性核的 SVM 上。SVM 作为预测模型的强大功能与使用非线性核产生的灵活性有关。此外，SVM 已扩展到用于建模生存结果。本文通过提出三种基于非线性 SVM 和 SVM 进行生存分析的变量排名方法，扩展了递归特征消除（RFE）算法。

结果

所提出的算法允许可视化每个 RFE 迭代，因此，可以识别出与响应变量最相关的预测变量。我们使用基于时间到事件结果的模拟研究和三个真实数据集，基于伪样本和核主成分分析评估了这三种方法，并将它们与用于非线性核的原始 SVM-RFE 算法进行了比较。在所提出的三种算法中，在比较模拟研究中真正最相关的变量与每个算法生成的变量等级时，与非线性核的黄金标准 RFE 相比，一般表现得更好。通常，RFE-伪样本优于其他三种方法，即使在所有测试场景中都假设变量是相关的。

结论

在所提出的方法中，可以使用准确性来选择变量，并使用 SVM 分析生物医学数据中的分类或时间到事件响应，评估预测变量和响应之间的关联方向和强度。在分析生物医学数据时，特别是使用 RFE-伪样本方法，这些方法可以准确地执行变量选择并解释预测变量和结果之间的关联方向和强度。与 Guyon 的经典 RFE 相比，这些方法在关于生物医学数据结构的现实场景中表现更好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd17/6245920/a242c27ec898/12859_2018_2451_Fig1_HTML.jpg

相似文献

SVM-RFE: selection and visualization of the most relevant features through non-linear kernels.SVM-RFE：通过非线性核选择和可视化最相关特征。

BMC Bioinformatics. 2018 Nov 19;19(1):432. doi: 10.1186/s12859-018-2451-4.

Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics.基于 SVM-RFE 和重叠率选择特征子集及其在生物信息学中的应用。

Molecules. 2017 Dec 26;23(1):52. doi: 10.3390/molecules23010052.

An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data.基于基因表达数据的多支持向量机技术的高效特征选择策略。

Biomed Res Int. 2018 Aug 30;2018:7538204. doi: 10.1155/2018/7538204. eCollection 2018.

Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis.用于微阵列表达数据分析的两阶段支持向量机-递归特征消除基因选择策略的开发。

IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):365-81. doi: 10.1109/TCBB.2007.70224.

Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE.基于最大间隔准则的递归基因选择：与支持向量机递归特征消除法的比较

BMC Bioinformatics. 2006 Dec 25;7:543. doi: 10.1186/1471-2105-7-543.

Tuning to optimize SVM approach for assisting ovarian cancer diagnosis with photoacoustic imaging.调整以优化支持向量机方法，用于通过光声成像辅助卵巢癌诊断。

Biomed Mater Eng. 2015;26 Suppl 1:S975-81. doi: 10.3233/BME-151392.

A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information.基于人工对比变量和互信息的支持向量机递归特征消除特征选择方法。

J Chromatogr B Analyt Technol Biomed Life Sci. 2012 Dec 1;910:149-55. doi: 10.1016/j.jchromb.2012.05.020. Epub 2012 May 24.

Ensemble Feature Learning of Genomic Data Using Support Vector Machine.使用支持向量机的基因组数据集成特征学习

PLoS One. 2016 Jun 15;11(6):e0157330. doi: 10.1371/journal.pone.0157330. eCollection 2016.

Recursive Support Vector Machine Biomarker Selection for Alzheimer's Disease.递归支持向量机生物标志物选择阿尔茨海默病。

J Alzheimers Dis. 2021;79(4):1691-1700. doi: 10.3233/JAD-201254.

An efficient model selection for linear discriminant function-based recursive feature elimination.基于线性判别函数的递归特征消除的有效模型选择。

J Biomed Inform. 2022 May;129:104070. doi: 10.1016/j.jbi.2022.104070. Epub 2022 Apr 15.

引用本文的文献

Identification of Endoplasmic Reticulum Stress-Related Genes in Osteoporosis Pathogenesis.骨质疏松症发病机制中内质网应激相关基因的鉴定

Mediators Inflamm. 2025 Aug 30;2025:6726771. doi: 10.1155/mi/6726771. eCollection 2025.

PI3 as a Common Hub Gene Linking Atopic Dermatitis and Ulcerative Colitis Through Immune Cell Recruitment Mechanisms.PI3作为通过免疫细胞招募机制连接特应性皮炎和溃疡性结肠炎的共同枢纽基因。

J Inflamm Res. 2025 Aug 27;18:11853-11868. doi: 10.2147/JIR.S527507. eCollection 2025.

CD4 T Cell Subsets and as Novel Biomarkers of Immune Dysregulation in Dilated Cardiomyopathy.CD4 T细胞亚群作为扩张型心肌病免疫失调的新型生物标志物

Int J Mol Sci. 2025 Aug 13;26(16):7806. doi: 10.3390/ijms26167806.

CCDC138 overexpression predicts poor prognosis and highlights ciliopathy-linked mechanisms in uterine corpus endometrial carcinoma.CCDC138过表达预示子宫体子宫内膜癌预后不良，并突显了与纤毛病相关的机制。

Front Mol Biosci. 2025 Aug 8;12:1622496. doi: 10.3389/fmolb.2025.1622496. eCollection 2025.

Identification of Neutrophil Extracellular Trap-Related Biomarkers in Diabetic Foot Ulcers Based on Bioinformatics.基于生物信息学的糖尿病足溃疡中性粒细胞胞外陷阱相关生物标志物的鉴定

J Inflamm Res. 2025 Aug 18;18:11355-11372. doi: 10.2147/JIR.S531204. eCollection 2025.

Machine Learning-Based Identification and Experimental Validation of Hub Ferroptosis-Related Cuproptosis Genes in Lupus Nephritis.基于机器学习的狼疮性肾炎中关键铁死亡相关铜死亡基因的鉴定与实验验证

J Inflamm Res. 2025 Aug 18;18:11335-11353. doi: 10.2147/JIR.S526572. eCollection 2025.

Revealing potential interfering genes between abdominal aortic aneurysm and periodontitis through machine learning and bioinformatics analysis.通过机器学习和生物信息学分析揭示腹主动脉瘤与牙周炎之间潜在的干扰基因。

PLoS One. 2025 Aug 26;20(8):e0329592. doi: 10.1371/journal.pone.0329592. eCollection 2025.

Identification of Ifitm1 as a Pivotal Gene in Mouse Spinal Cord Injury Using Comprehensive Machine Learning Algorithms.使用综合机器学习算法鉴定Ifitm1作为小鼠脊髓损伤中的关键基因

Mediators Inflamm. 2025 Aug 11;2025:6149780. doi: 10.1155/mi/6149780. eCollection 2025.

Bioinformatics and Experimental Validation of Diagnostic Marker Genes for Myocardial Infarction and Analysis of Their Immune Cell Infiltration.心肌梗死诊断标志物基因的生物信息学分析与实验验证及其免疫细胞浸润分析

Biochem Genet. 2025 Aug 19. doi: 10.1007/s10528-025-11211-2.

Integrated multi-omics analysis reveals diagnostic biomarkers and therapeutic targets for systemic lupus erythematosus.综合多组学分析揭示系统性红斑狼疮的诊断生物标志物和治疗靶点。

Medicine (Baltimore). 2025 Aug 15;104(33):e42290. doi: 10.1097/MD.0000000000042290.

本文引用的文献

Kernel-PCA data integration with enhanced interpretability.具有增强可解释性的核主成分分析数据集成。

BMC Syst Biol. 2014;8 Suppl 2(Suppl 2):S6. doi: 10.1186/1752-0509-8-S2-S6. Epub 2014 Mar 13.

Opening the kernel of kernel partial least squares and support vector machines.开启核偏最小二乘法和支持向量机的内核。

Anal Chim Acta. 2011 Oct 31;705(1-2):123-34. doi: 10.1016/j.aca.2011.04.025. Epub 2011 Apr 22.

Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.弹性 SCAD 作为一种新的惩罚方法，用于高维数据中的 SVM 分类任务。

BMC Bioinformatics. 2011 May 9;12:138. doi: 10.1186/1471-2105-12-138.

Visualization and recovery of the (bio)chemical interesting variables in data analysis with support vector machine classification.利用支持向量机分类在数据分析中对（生物）化学感兴趣变量进行可视化和恢复。

Anal Chem. 2010 Aug 15;82(16):7000-7. doi: 10.1021/ac101338y.

penalizedSVM: a R-package for feature selection SVM classification.惩罚支持向量机：一个用于特征选择支持向量机分类的R包。

Bioinformatics. 2009 Jul 1;25(13):1711-2. doi: 10.1093/bioinformatics/btp286. Epub 2009 Apr 27.

Input space versus feature space in kernel-based methods.基于内核方法中的输入空间与特征空间。

IEEE Trans Neural Netw. 1999;10(5):1000-17. doi: 10.1109/72.788641.

A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。

Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.

Generating survival times to simulate Cox proportional hazards models.生成生存时间以模拟Cox比例风险模型。

Stat Med. 2005 Jun 15;24(11):1713-23. doi: 10.1002/sim.2059.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SVM-RFE：通过非线性核选择和可视化最相关特征。

SVM-RFE: selection and visualization of the most relevant features through non-linear kernels.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献