一种新的机器学习模型选择策略 - 逐步支持向量机（StepSVM）。

A novel machine learning strategy for model selections - Stepwise Support Vector Machine (StepSVM).

机构信息

Institute of Public Health, School of Medicine, National Yang-Ming University, Taipei, Taiwan.

出版信息

PLoS One. 2020 Aug 27;15(8):e0238384. doi: 10.1371/journal.pone.0238384. eCollection 2020.

DOI:10.1371/journal.pone.0238384

PMID:32853243

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7451646/

Abstract

An essential aspect of medical research is the prediction for a health outcome and the scientific identification of important factors. As a result, numerous methods were developed for model selections in recent years. In the era of big data, machine learning has been broadly adopted for data analysis. In particular, the Support Vector Machine (SVM) has an excellent performance in classifications and predictions with the high-dimensional data. In this research, a novel model selection strategy is carried out, named as the Stepwise Support Vector Machine (StepSVM). The new strategy is based on the SVM to conduct a modified stepwise selection, where the tuning parameter could be determined by 10-fold cross-validation that minimizes the mean squared error. Two popular methods, the conventional stepwise logistic regression model and the SVM Recursive Feature Elimination (SVM-RFE), were compared to the StepSVM. The Stability and accuracy of the three strategies were evaluated by simulation studies with a complex hierarchical structure. Up to five variables were selected to predict the dichotomous cancer remission of a lung cancer patient. Regarding the stepwise logistic regression, the mean of the C-statistic was 69.19%. The overall accuracy of the SVM-RFE was estimated at 70.62%. In contrast, the StepSVM provided the highest prediction accuracy of 80.57%. Although the StepSVM is more time consuming, it is more consistent and outperforms the other two methods.

摘要

医学研究的一个重要方面是预测健康结果和科学识别重要因素。因此，近年来开发了许多用于模型选择的方法。在大数据时代，机器学习已广泛应用于数据分析。特别是支持向量机（SVM）在高维数据的分类和预测方面具有出色的性能。在这项研究中，提出了一种新的模型选择策略，称为逐步支持向量机（StepSVM）。该新策略基于 SVM 进行改进的逐步选择，其中调谐参数可以通过 10 倍交叉验证确定，该方法可最小化均方误差。将两种流行的方法，传统的逐步逻辑回归模型和 SVM 递归特征消除（SVM-RFE）与 StepSVM 进行了比较。通过具有复杂层次结构的模拟研究评估了三种策略的稳定性和准确性。最多选择五个变量来预测肺癌患者的癌症缓解情况。对于逐步逻辑回归，C 统计量的平均值为 69.19%。SVM-RFE 的整体准确性估计为 70.62%。相比之下，StepSVM 提供了最高的预测准确性，为 80.57%。尽管 StepSVM 耗时更多，但它更一致，并且优于其他两种方法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种新的机器学习模型选择策略 - 逐步支持向量机（StepSVM）。

A novel machine learning strategy for model selections - Stepwise Support Vector Machine (StepSVM).

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

一种新的机器学习模型选择策略 - 逐步支持向量机（StepSVM）。

A novel machine learning strategy for model selections - Stepwise Support Vector Machine (StepSVM).

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献