针对不可分和不平衡数据集的最优支持向量机参数选择

Optimal SVM parameter selection for non-separable and unbalanced datasets.

作者信息

Jiang Peng, Missoum Samy, Chen Zhao

机构信息

Aerospace and Mechanical Engineering Department, University of Arizona, Tucson, Arizona.

Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, Arizona.

出版信息

Struct Multidiscipl Optim. 2014 Oct 1;50(4):523-535. doi: 10.1007/s00158-014-1105-z.

DOI:10.1007/s00158-014-1105-z

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4170691/

Abstract

This article presents a study of three validation metrics used for the selection of optimal parameters of a support vector machine (SVM) classifier in the case of non-separable and unbalanced datasets. This situation is often encountered when the data is obtained experimentally or clinically. The three metrics selected in this work are the area under the ROC curve (AUC), accuracy, and balanced accuracy. These validation metrics are tested using computational data only, which enables the creation of fully separable sets of data. This way, non-separable datasets, representative of a real-world problem, can be created by projection onto a lower dimensional sub-space. The knowledge of the separable dataset, unknown in real-world problems, provides a reference to compare the three validation metrics using a quantity referred to as the "weighted likelihood". As an application example, the study investigates a classification model for hip fracture prediction. The data is obtained from a parameterized finite element model of a femur. The performance of the various validation metrics is studied for several levels of separability, ratios of unbalance, and training set sizes.

摘要

本文介绍了一项关于三种验证指标的研究，这些指标用于在非可分和不平衡数据集的情况下选择支持向量机（SVM）分类器的最优参数。当通过实验或临床获得数据时，经常会遇到这种情况。本研究选择的三个指标是ROC曲线下面积（AUC）、准确率和平衡准确率。这些验证指标仅使用计算数据进行测试，这使得能够创建完全可分的数据集。通过这种方式，可以通过投影到低维子空间来创建代表现实世界问题的非可分数据集。在现实世界问题中未知的可分数据集的知识，提供了一个参考，用于使用称为“加权似然”的量来比较这三种验证指标。作为一个应用示例，该研究调查了一个髋部骨折预测的分类模型。数据来自股骨的参数化有限元模型。针对几种可分性水平、不平衡比率和训练集大小，研究了各种验证指标的性能。

相似文献

1

Optimal SVM parameter selection for non-separable and unbalanced datasets.

Struct Multidiscipl Optim. 2014 Oct 1;50(4):523-535. doi: 10.1007/s00158-014-1105-z.

2

Computer-Aided Detection of Incidental Lumbar Spine Fractures from Routine Dual-Energy X-Ray Absorptiometry (DEXA) Studies Using a Support Vector Machine (SVM) Classifier.

J Digit Imaging. 2020 Feb;33(1):204-210. doi: 10.1007/s10278-019-00224-0.

3

Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.

Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.

4

Probabilistic classification vector machines.

IEEE Trans Neural Netw. 2009 Jun;20(6):901-14. doi: 10.1109/TNN.2009.2014161. Epub 2009 Apr 24.

5

Large-scale learning of structure-activity relationships using a linear support vector machine and problem-specific metrics.

J Chem Inf Model. 2011 Feb 28;51(2):203-13. doi: 10.1021/ci100073w. Epub 2011 Jan 5.

6

Bias in error estimation when using cross-validation for model selection.

BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.

7

Efficient Selection of Gaussian Kernel SVM Parameters for Imbalanced Data.

Genes (Basel). 2023 Feb 25;14(3):583. doi: 10.3390/genes14030583.

8

Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model.

Math Biosci Eng. 2023 Sep 15;20(10):17672-17701. doi: 10.3934/mbe.2023786.

9

Application of machine learning model in predicting the likelihood of blood transfusion after hip fracture surgery.

Aging Clin Exp Res. 2023 Nov;35(11):2643-2656. doi: 10.1007/s40520-023-02550-4. Epub 2023 Sep 21.

10

Support Vector Machines (SVM) classification of prostate cancer Gleason score in central gland using multiparametric magnetic resonance images: A cross-validated study.

Eur J Radiol. 2018 Jan;98:61-67. doi: 10.1016/j.ejrad.2017.11.001. Epub 2017 Nov 6.

引用本文的文献

1

Fusion of clinical and stochastic finite element data for hip fracture risk prediction.

J Biomech. 2015 Nov 26;48(15):4043-4052. doi: 10.1016/j.jbiomech.2015.09.044. Epub 2015 Oct 9.

本文引用的文献

1

Accuracy of finite element predictions in sideways load configurations for the proximal human femur.

J Biomech. 2012 Jan 10;45(2):394-9. doi: 10.1016/j.jbiomech.2011.10.019. Epub 2011 Nov 12.

2

Patient-centered yes/no prognosis using learning machines.

Int J Data Min Bioinform. 2008;2(4):289-341. doi: 10.1504/ijdmb.2008.022149.

3

An overview of statistical learning theory.

IEEE Trans Neural Netw. 1999;10(5):988-99. doi: 10.1109/72.788640.

4

Comparison of the elastic and yield properties of human femoral trabecular and cortical bone tissue.

J Biomech. 2004 Jan;37(1):27-35. doi: 10.1016/s0021-9290(03)00257-4.

5

On the modelling bone tissue fracture and healing of the bone tissue.

Acta Cient Venez. 2003;54(1):58-75.

6

Gaussian processes for classification: mean-field algorithms.

Neural Comput. 2000 Nov;12(11):2655-84. doi: 10.1162/089976600300014881.

7

Bounds on error expectation for support vector machines.

Neural Comput. 2000 Sep;12(9):2013-36. doi: 10.1162/089976600300015042.

8

Comparison of the predicted and observed secondary structure of T4 phage lysozyme.

Biochim Biophys Acta. 1975 Oct 20;405(2):442-51. doi: 10.1016/0005-2795(75)90109-9.

9

Basic principles of ROC analysis.

Semin Nucl Med. 1978 Oct;8(4):283-98. doi: 10.1016/s0001-2998(78)80014-2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。