基于聚类的欠采样与随机过采样示例和支持向量机在乳腺癌诊断中的不平衡分类。

Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis.

机构信息

School of Information and Technology, Northwest University , Xi'an , China.

School of Information Engineering, Yulin University , Yulin , China.

出版信息

Comput Assist Surg (Abingdon). 2019 Oct;24(sup2):62-72. doi: 10.1080/24699322.2019.1649074. Epub 2019 Aug 12.

DOI:10.1080/24699322.2019.1649074

PMID:31403330

Abstract

To overcome the two-class imbalanced classification problem existing in the diagnosis of breast cancer, a hybrid of Random Over Sampling Example, K-means and Support vector machine (RK-SVM) model is proposed which is based on sample selection. Random Over Sampling Example (ROSE) is utilized to balance the dataset and further improve the diagnosis accuracy by Support Vector Machine (SVM). As there is one different sample selection factor via clustering that encourages selecting the samples near the class boundary. The purpose of clustering here is to reduce the risk of removing useful samples and improve the efficiency of sample selection. To test the performance of the new hybrid classifier, it is implemented on breast cancer datasets and the other three datasets from the University of California Irvine (UCI) machine learning repository, which are commonly used datasets in class imbalanced learning. The extensive experimental results show that our proposed hybrid method outperforms most of the competitive algorithms in term of G-mean and accuracy indices. Additionally, experimental results show that this method also performs superiorly for binary problems.

摘要

为了克服乳腺癌诊断中存在的两类不平衡分类问题，提出了一种基于样本选择的随机过采样示例、K-均值和支持向量机（RK-SVM）模型的混合模型。随机过采样示例（ROSE）用于平衡数据集，并通过支持向量机（SVM）进一步提高诊断准确性。由于通过聚类有一个不同的样本选择因素，鼓励选择类边界附近的样本。这里聚类的目的是降低去除有用样本的风险并提高样本选择的效率。为了测试新混合分类器的性能，将其应用于乳腺癌数据集以及加利福尼亚大学欧文分校（UCI）机器学习存储库中的另外三个数据集，这些数据集是不平衡学习中常用的数据集。广泛的实验结果表明，我们提出的混合方法在 G-均值和准确性指标方面优于大多数竞争算法。此外，实验结果表明，该方法在二进制问题上也表现出色。

相似文献

Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis.基于聚类的欠采样与随机过采样示例和支持向量机在乳腺癌诊断中的不平衡分类。

Comput Assist Surg (Abingdon). 2019 Oct;24(sup2):62-72. doi: 10.1080/24699322.2019.1649074. Epub 2019 Aug 12.

Prediction of Breast Cancer from Imbalance Respect Using Cluster-Based Undersampling Method.基于聚类欠采样的不平衡数据乳腺癌预测。

J Healthc Eng. 2019 Oct 16;2019:7294582. doi: 10.1155/2019/7294582. eCollection 2019.

Affinity and class probability-based fuzzy support vector machine for imbalanced data sets.基于亲和力和类概率的模糊支持向量机在不平衡数据集上的应用。

Neural Netw. 2020 Feb;122:289-307. doi: 10.1016/j.neunet.2019.10.016. Epub 2019 Nov 2.

Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model.通过结合改进的大趋势扩散和装袋极限学习机模型的新型混合采样，改进不平衡医学数据集的支持向量机分类。

Math Biosci Eng. 2023 Sep 15;20(10):17672-17701. doi: 10.3934/mbe.2023786.

Efficient Selection of Gaussian Kernel SVM Parameters for Imbalanced Data.基于不平衡数据的高斯核 SVM 参数的有效选择。

Genes (Basel). 2023 Feb 25;14(3):583. doi: 10.3390/genes14030583.

Diagnosis of Brain Metastases from Lung Cancer Using a Modified Electromagnetism like Mechanism Algorithm.基于改良电磁类机制算法诊断肺癌脑转移

J Med Syst. 2016 Jan;40(1):35. doi: 10.1007/s10916-015-0367-3. Epub 2015 Nov 14.

Knowledge mining from clinical datasets using rough sets and backpropagation neural network.使用粗糙集和反向传播神经网络从临床数据集中进行知识挖掘。

Comput Math Methods Med. 2015;2015:460189. doi: 10.1155/2015/460189. Epub 2015 Mar 4.

Vicinal support vector classifier using supervised kernel-based clustering.基于监督核聚类的邻接支持向量分类器。

Artif Intell Med. 2014 Mar;60(3):189-96. doi: 10.1016/j.artmed.2014.01.003. Epub 2014 Feb 7.

The construction of support vector machine classifier using the firefly algorithm.基于萤火虫算法的支持向量机分类器构建。

Comput Intell Neurosci. 2015;2015:212719. doi: 10.1155/2015/212719. Epub 2015 Feb 23.

Detection of breast cancer of various clinical stages based on serum FT-IR spectroscopy combined with multiple algorithms.基于血清傅里叶变换红外光谱结合多种算法检测不同临床分期的乳腺癌

Photodiagnosis Photodyn Ther. 2021 Mar;33:102199. doi: 10.1016/j.pdpdt.2021.102199. Epub 2021 Jan 27.

引用本文的文献

Early detection and analysis of accurate breast cancer for improved diagnosis using deep supervised learning for enhanced patient outcomes.利用深度监督学习进行早期准确乳腺癌检测与分析，以改善诊断，提高患者治疗效果。

PeerJ Comput Sci. 2025 Apr 24;11:e2784. doi: 10.7717/peerj-cs.2784. eCollection 2025.

Tongue coating microbial communities vary in children with Henoch-Schönlein purpura.过敏性紫癜患儿的舌苔微生物群落存在差异。

Sci Rep. 2025 Feb 14;15(1):5466. doi: 10.1038/s41598-025-88610-3.

STANet: A Novel Spatio-Temporal Aggregation Network for Depression Classification with Small and Unbalanced FMRI Data.STANet：一种用于利用少量且不均衡的功能磁共振成像数据进行抑郁症分类的新型时空聚合网络。

Tomography. 2024 Nov 28;10(12):1895-1914. doi: 10.3390/tomography10120138.

Predicting the risk of mortality and rehospitalization in heart failure patients: A retrospective cohort study by machine learning approach.基于机器学习方法的心力衰竭患者死亡和再住院风险预测：回顾性队列研究。

Clin Cardiol. 2024 Feb;47(2):e24239. doi: 10.1002/clc.24239.

In-Hospital Mortality Prediction Model for Critically Ill Older Adult Patients Transferred from the Emergency Department to the Intensive Care Unit.从急诊科转入重症监护病房的老年危重症患者的院内死亡率预测模型

Risk Manag Healthc Policy. 2023 Nov 22;16:2555-2563. doi: 10.2147/RMHP.S442138. eCollection 2023.

Machine learning prediction models for different stages of non-small cell lung cancer based on tongue and tumor marker: a pilot study.基于舌象和肿瘤标志物的非小细胞肺癌不同阶段的机器学习预测模型：一项初步研究。

BMC Med Inform Decis Mak. 2023 Sep 29;23(1):197. doi: 10.1186/s12911-023-02266-5.

Predictive association of gut microbiome and NLR in anemic low middle-income population of Odisha- a cross-sectional study.奥里萨邦贫血中低收入人群肠道微生物群与中性粒细胞与淋巴细胞比值的预测性关联——一项横断面研究

Front Nutr. 2023 Jul 13;10:1200688. doi: 10.3389/fnut.2023.1200688. eCollection 2023.

Use Test of Automated Machine Learning in Cancer Diagnostics.在癌症诊断中使用自动机器学习测试。

Diagnostics (Basel). 2023 Jul 8;13(14):2315. doi: 10.3390/diagnostics13142315.

An Integrated Glycosylation Signature of Rheumatoid Arthritis.类风湿关节炎的综合糖基化特征。

Biomolecules. 2023 Jul 12;13(7):1106. doi: 10.3390/biom13071106.

Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data.重采样数据以解决类别不平衡问题的影响（IRCIP）：医学数据中分类算法间性能影响的评估

JAMIA Open. 2023 May 31;6(2):ooad033. doi: 10.1093/jamiaopen/ooad033. eCollection 2023 Jul.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于聚类的欠采样与随机过采样示例和支持向量机在乳腺癌诊断中的不平衡分类。

Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献