• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于不平衡数据的高斯核 SVM 参数的有效选择。

Efficient Selection of Gaussian Kernel SVM Parameters for Imbalanced Data.

机构信息

Division of Biometry, Department of Agronomy, National Taiwan University, Taipei 106216, Taiwan.

出版信息

Genes (Basel). 2023 Feb 25;14(3):583. doi: 10.3390/genes14030583.

DOI:10.3390/genes14030583
PMID:36980852
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10048125/
Abstract

For medical data mining, the development of a class prediction model has been widely used to deal with various kinds of data classification problems. Classification models especially for high-dimensional gene expression datasets have attracted many researchers in order to identify marker genes for distinguishing any type of cancer cells from their corresponding normal cells. However, skewed class distributions often occur in the medical datasets in which at least one of the classes has a relatively small number of observations. A classifier induced by such an imbalanced dataset typically has a high accuracy for the majority class and poor prediction for the minority class. In this study, we focus on an SVM classifier with a Gaussian radial basis kernel for a binary classification problem. In order to take advantage of an SVM and to achieve the best generalization ability for improving the classification performance, we will address two important problems: the class imbalance and parameter selection during SVM parameter optimization. First of all, we proposed a novel adjustment method called b-SVM, for adjusting the cutoff threshold of the SVM. Second, we proposed a fast and simple approach, called the Min-max gamma selection, to optimize the model parameters of SVMs without carrying out an extensive k-fold cross validation. An extensive comparison with a standard SVM and well-known existing methods are carried out to evaluate the performance of our proposed algorithms using simulated and real datasets. The experimental results show that our proposed algorithms outperform the over-sampling techniques and existing SVM-based solutions. This study also shows that the proposed Min-max gamma selection is at least 10 times faster than the cross-validation selection based on the average running time on six real datasets.

摘要

对于医学数据挖掘,开发类别预测模型已被广泛用于处理各种数据分类问题。分类模型,特别是针对高维基因表达数据集的分类模型,吸引了许多研究人员,旨在识别标记基因,以便将任何类型的癌细胞与其相应的正常细胞区分开来。然而,医学数据集中经常出现类别分布不均衡的情况,至少有一个类别观测值较少。由这种不平衡数据集诱导的分类器通常对多数类别的准确性较高,而对少数类别的预测效果较差。在本研究中,我们专注于使用高斯径向基核的 SVM 分类器解决二分类问题。为了充分利用 SVM 并实现最佳的泛化能力以提高分类性能,我们将解决两个重要问题:SVM 参数优化过程中的类别不均衡和参数选择。首先,我们提出了一种称为 b-SVM 的新调整方法,用于调整 SVM 的截止阈值。其次,我们提出了一种快速而简单的方法,称为 Min-max gamma 选择,用于优化 SVM 的模型参数,而无需进行广泛的 k 折交叉验证。通过使用模拟数据集和真实数据集对我们提出的算法与标准 SVM 和知名现有方法进行了广泛比较,以评估我们提出的算法的性能。实验结果表明,我们提出的算法优于过采样技术和现有的基于 SVM 的解决方案。本研究还表明,与基于平均运行时间的交叉验证选择相比,提出的 Min-max gamma 选择至少快 10 倍,在六个真实数据集上进行测试。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3174/10048125/566056fc83b7/genes-14-00583-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3174/10048125/cf2917ef8e29/genes-14-00583-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3174/10048125/839f2b71552a/genes-14-00583-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3174/10048125/9145b2b13f82/genes-14-00583-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3174/10048125/918cf898b18d/genes-14-00583-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3174/10048125/53e79a46cc27/genes-14-00583-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3174/10048125/566056fc83b7/genes-14-00583-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3174/10048125/cf2917ef8e29/genes-14-00583-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3174/10048125/839f2b71552a/genes-14-00583-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3174/10048125/9145b2b13f82/genes-14-00583-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3174/10048125/918cf898b18d/genes-14-00583-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3174/10048125/53e79a46cc27/genes-14-00583-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3174/10048125/566056fc83b7/genes-14-00583-g006.jpg

相似文献

1
Efficient Selection of Gaussian Kernel SVM Parameters for Imbalanced Data.基于不平衡数据的高斯核 SVM 参数的有效选择。
Genes (Basel). 2023 Feb 25;14(3):583. doi: 10.3390/genes14030583.
2
Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis.基于聚类的欠采样与随机过采样示例和支持向量机在乳腺癌诊断中的不平衡分类。
Comput Assist Surg (Abingdon). 2019 Oct;24(sup2):62-72. doi: 10.1080/24699322.2019.1649074. Epub 2019 Aug 12.
3
Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model.通过结合改进的大趋势扩散和装袋极限学习机模型的新型混合采样,改进不平衡医学数据集的支持向量机分类。
Math Biosci Eng. 2023 Sep 15;20(10):17672-17701. doi: 10.3934/mbe.2023786.
4
Class-imbalanced classifiers for high-dimensional data.高维数据的不平衡分类器。
Brief Bioinform. 2013 Jan;14(1):13-26. doi: 10.1093/bib/bbs006. Epub 2012 Mar 9.
5
Inverse free reduced universum twin support vector machine for imbalanced data classification.用于不平衡数据分类的逆自由约简全域孪生支持向量机
Neural Netw. 2023 Jan;157:125-135. doi: 10.1016/j.neunet.2022.10.003. Epub 2022 Oct 15.
6
Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric.使用马修斯相关系数度量的不平衡数据最优分类器。
PLoS One. 2017 Jun 2;12(6):e0177678. doi: 10.1371/journal.pone.0177678. eCollection 2017.
7
Affinity and class probability-based fuzzy support vector machine for imbalanced data sets.基于亲和力和类概率的模糊支持向量机在不平衡数据集上的应用。
Neural Netw. 2020 Feb;122:289-307. doi: 10.1016/j.neunet.2019.10.016. Epub 2019 Nov 2.
8
Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs.带有同等或不等误分类代价的不平衡数据分类的近贝叶斯支持向量机。
Neural Netw. 2015 Oct;70:39-52. doi: 10.1016/j.neunet.2015.06.005. Epub 2015 Jul 8.
9
New bandwidth selection criterion for Kernel PCA: approach to dimensionality reduction and classification problems.核主成分分析的新带宽选择准则:降维和分类问题的方法。
BMC Bioinformatics. 2014 May 10;15:137. doi: 10.1186/1471-2105-15-137.
10
Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction.机器学习中特征选择的最佳评分对及其在癌症预后预测中的应用。
BMC Bioinformatics. 2011 Sep 23;12:375. doi: 10.1186/1471-2105-12-375.

引用本文的文献

1
The application of artificial intelligence models in predicting the risk of diabetic foot: a multicenter study.人工智能模型在预测糖尿病足风险中的应用:一项多中心研究。
BioData Min. 2025 Aug 21;18(1):57. doi: 10.1186/s13040-025-00477-2.
2
Automatic detection of fungiform papillae on the human tongue via Convolutional Neural Networks and identification of the best performing model.通过卷积神经网络自动检测人舌上的菌状乳头并识别性能最佳的模型。
Comput Struct Biotechnol J. 2025 May 14;27:1927-1934. doi: 10.1016/j.csbj.2025.05.014. eCollection 2025.
3
The value of CCTA combined with machine learning for predicting angina pectoris in the anomalous origin of the right coronary artery.

本文引用的文献

1
Class-imbalanced classifiers for high-dimensional data.高维数据的不平衡分类器。
Brief Bioinform. 2013 Jan;14(1):13-26. doi: 10.1093/bib/bbs006. Epub 2012 Mar 9.
2
A stable gene selection in microarray data analysis.微阵列数据分析中的稳定基因选择。
BMC Bioinformatics. 2006 Apr 27;7:228. doi: 10.1186/1471-2105-7-228.
3
Gene expression-based classification of malignant gliomas correlates better with survival than histological classification.基于基因表达的恶性胶质瘤分类与生存的相关性比组织学分类更好。
CCTA 联合机器学习对右冠状动脉起源异常患者心绞痛的预测价值。
Biomed Eng Online. 2024 Sep 12;23(1):95. doi: 10.1186/s12938-024-01286-0.
4
Bioinformatics reveals the pathophysiological relationship between diabetic nephropathy and periodontitis in the context of aging.生物信息学揭示了在衰老背景下糖尿病肾病与牙周炎之间的病理生理关系。
Heliyon. 2024 Jan 18;10(2):e24872. doi: 10.1016/j.heliyon.2024.e24872. eCollection 2024 Jan 30.
5
Application of machine learning-based multi-sequence MRI radiomics in diagnosing anterior cruciate ligament tears.基于机器学习的多序列 MRI 放射组学在诊断前交叉韧带撕裂中的应用。
J Orthop Surg Res. 2024 Jan 31;19(1):99. doi: 10.1186/s13018-024-04602-5.
Cancer Res. 2003 Apr 1;63(7):1602-7.
4
Global gene expression analysis of gastric cancer by oligonucleotide microarrays.利用寡核苷酸微阵列对胃癌进行全基因组表达分析。
Cancer Res. 2002 Jan 1;62(1):233-40.
5
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.通过寡核苷酸阵列探测的肿瘤和正常结肠组织的聚类分析所揭示的基因表达广泛模式。
Proc Natl Acad Sci U S A. 1999 Jun 8;96(12):6745-50. doi: 10.1073/pnas.96.12.6745.