• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
RSMOTE: improving classification performance over imbalanced medical datasets.RSMOTE:提升不平衡医学数据集的分类性能
Health Inf Sci Syst. 2020 Jun 12;8(1):22. doi: 10.1007/s13755-020-00112-w. eCollection 2020 Dec.
2
Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19.异常值合成少数过采样技术(Outlier-SMOTE):一种用于改进新冠病毒(COVID-19)检测的精细过采样技术。
Intell Based Med. 2020 Dec;3:100023. doi: 10.1016/j.ibmed.2020.100023. Epub 2020 Dec 3.
3
A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare.一种用于医疗保健中高度不平衡数据分类的自检测自适应合成少数过采样技术算法(SASMOTE)。
BioData Min. 2023 Apr 25;16(1):15. doi: 10.1186/s13040-023-00330-4.
4
LVQ-SMOTE - Learning Vector Quantization based Synthetic Minority Over-sampling Technique for biomedical data.LVQ-SMOTE - 基于学习向量量化的生物医学数据合成少数类过采样技术。
BioData Min. 2013 Oct 2;6(1):16. doi: 10.1186/1756-0381-6-16.
5
Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.基于结构-活性关系的高度不平衡Tox21数据集的化学分类
J Cheminform. 2020 Oct 27;12(1):66. doi: 10.1186/s13321-020-00468-x.
6
A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data.基于随机森林的 M-SMOTE 与ENN 混合采样算法在医学不平衡数据中的应用
J Biomed Inform. 2020 Jul;107:103465. doi: 10.1016/j.jbi.2020.103465. Epub 2020 Jun 5.
7
A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification.一种基于高斯混合模型滤波的合成少数类过采样技术用于不平衡数据分类
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3740-3753. doi: 10.1109/TNNLS.2022.3197156. Epub 2024 Feb 29.
8
Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification.基于自适应群体聚类的动态多目标合成少数类过采样技术算法,用于处理生物医学数据分类中的二元不平衡数据集。
BioData Min. 2016 Dec 1;9:37. doi: 10.1186/s13040-016-0117-1. eCollection 2016.
9
Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy.机器学习中不平衡数据集的重采样技术比较:在局灶性癫痫患者发作间期颅内脑电图记录的致痫区定位中的应用
Front Neuroinform. 2021 Nov 19;15:715421. doi: 10.3389/fninf.2021.715421. eCollection 2021.
10
Imbalanced medical disease dataset classification using enhanced generative adversarial network.使用增强生成对抗网络的不平衡医学疾病数据集分类
Comput Methods Biomech Biomed Engin. 2023 Oct-Dec;26(14):1702-1718. doi: 10.1080/10255842.2022.2134729. Epub 2022 Nov 2.

引用本文的文献

1
Application of machine learning for the analysis of peripheral blood biomarkers in oral mucosal diseases: a cross-sectional study.机器学习在口腔黏膜疾病外周血生物标志物分析中的应用:一项横断面研究。
BMC Oral Health. 2025 May 10;25(1):703. doi: 10.1186/s12903-025-06095-y.
2
Prediction of prolonged mechanical ventilation in the intensive care unit via machine learning: a COVID-19 perspective.通过机器学习预测重症监护病房中的长期机械通气:以COVID-19为例
Sci Rep. 2024 Dec 4;14(1):30173. doi: 10.1038/s41598-024-81980-0.
3
Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis.从不平衡数据中学习:先进重采样技术与机器学习模型的整合用于增强癌症诊断与预后
Cancers (Basel). 2024 Oct 8;16(19):3417. doi: 10.3390/cancers16193417.
4
Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique.使用合成少数过采样技术对老年腹部恶性肿瘤患者术后谵妄进行预测建模。
World J Gastrointest Oncol. 2024 Apr 15;16(4):1227-1235. doi: 10.4251/wjgo.v16.i4.1227.
5
Radiomics under 2D regions, 3D regions, and peritumoral regions reveal tumor heterogeneity in non-small cell lung cancer: a multicenter study.二维区域、三维区域和瘤周区域的放射组学揭示非小细胞肺癌的肿瘤异质性:一项多中心研究
Radiol Med. 2023 Sep;128(9):1079-1092. doi: 10.1007/s11547-023-01676-9. Epub 2023 Jul 24.
6
Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models.多指标特征选择分析结直肠癌的多类别结局:随机森林和多项逻辑回归模型。
Lab Invest. 2022 Mar;102(3):236-244. doi: 10.1038/s41374-021-00662-x. Epub 2021 Sep 18.

本文引用的文献

1
An effective density-based clustering and dynamic maintenance framework for evolving medical data streams.用于演化型医疗数据流的有效基于密度的聚类和动态维护框架。
Int J Med Inform. 2019 Jun;126:176-186. doi: 10.1016/j.ijmedinf.2019.03.016. Epub 2019 Mar 28.
2
Comparison of variable selection methods for clinical predictive modeling.比较临床预测建模中的变量选择方法。
Int J Med Inform. 2018 Aug;116:10-17. doi: 10.1016/j.ijmedinf.2018.05.006. Epub 2018 May 21.
3
Proteomics Versus Clinical Data and Stochastic Local Search Based Feature Selection for Acute Myeloid Leukemia Patients' Classification.蛋白质组学与临床数据及基于随机局部搜索的特征选择在急性髓系白血病患者分类中的应用。
J Med Syst. 2018 Jun 4;42(7):129. doi: 10.1007/s10916-018-0972-z.
4
SCADI: A standard dataset for self-care problems classification of children with physical and motor disability.SCADI:用于身体和运动残疾儿童自我护理问题分类的标准数据集。
Int J Med Inform. 2018 Jun;114:81-87. doi: 10.1016/j.ijmedinf.2018.03.003. Epub 2018 Mar 30.
5
Healthcare Text Classification System and its Performance Evaluation: A Source of Better Intelligence by Characterizing Healthcare Text.医疗保健文本分类系统及其性能评估:通过描述医疗保健文本提供更好的智能来源。
J Med Syst. 2018 Apr 13;42(5):97. doi: 10.1007/s10916-018-0941-6.
6
Supervised learning methods for pathological arterial pulse wave differentiation: A SVM and neural networks approach.监督学习方法在病理性动脉脉搏波区分中的应用:支持向量机和神经网络方法。
Int J Med Inform. 2018 Jan;109:30-38. doi: 10.1016/j.ijmedinf.2017.10.011. Epub 2017 Oct 31.
7
Prediction of lung cancer patient survival via supervised machine learning classification techniques.通过监督机器学习分类技术预测肺癌患者的生存情况。
Int J Med Inform. 2017 Dec;108:1-8. doi: 10.1016/j.ijmedinf.2017.09.013. Epub 2017 Sep 25.
8
Using machine learning to support healthcare professionals in making preauthorisation decisions.利用机器学习支持医疗保健专业人员做出预授权决策。
Int J Med Inform. 2016 Oct;94:1-7. doi: 10.1016/j.ijmedinf.2016.06.007. Epub 2016 Jun 16.

RSMOTE:提升不平衡医学数据集的分类性能

RSMOTE: improving classification performance over imbalanced medical datasets.

作者信息

Naseriparsa Mehdi, Al-Shammari Ahmed, Sheng Ming, Zhang Yong, Zhou Rui

机构信息

Swinburne University of Technology, Hawthorn, Australia.

University of Al-Qadisiyah, Al Diwaniyah, Iraq.

出版信息

Health Inf Sci Syst. 2020 Jun 12;8(1):22. doi: 10.1007/s13755-020-00112-w. eCollection 2020 Dec.

DOI:10.1007/s13755-020-00112-w
PMID:32549976
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7292850/
Abstract

INTRODUCTION

Medical diagnosis is a crucial step for patient treatment. However, diagnosis is prone to bias due to imbalanced datasets. To overcome the imbalanced dataset problem, simple minority oversampling technique (SMOTE) was proposed that can generate new synthetic samples at data level to create the balance between minority and majority classes. However, the synthetic samples are generated on a random basis which causes class mixture problem; thus, resulting in deteriorating the classification performance and biased diagnosis.

PURPOSE

In order to overcome the SMOTE shortcomings, some modified methods were proposed that try to generate synthetic samples along the line segment of selected minority samples. Most of these methods adopt one of the two policies for selecting minority samples to generate synthetic samples: borderline region sampling or safe region sampling. However, they both suffer from over-generalisation problem. We propose a modified SMOTE-based resampling method called RSMOTE to alleviate the medical imbalanced dataset problem. We provide an in-depth analysis and verify the performance of RSMOTE over imbalanced medical datasets.

METHODS

In this paper, the proposed RSMOTE divides the minority sample domain into four regions (normal, semi-normal, semi-critical, and critical) based on the minority sample density analysis. RSMOTE discovers the minority sample region globally and applies the resampling near a specific group of samples.

RESULTS

Our analysis and experiments verify that if synthetic samples are generated in the regions with high minority sample density, classification performance will be improved due to low risk of class mixture. Unlike some safe region methods, RSMOTE decides the region of minority samples on a global basis, thus removing the over-generalisation problem. Classic and additional evaluation metrics are considered to measure the effectiveness of the modified method: Recall, FP Rate, Precision, F-Measure, ROC area, and Average Aggregated Metric. We carried out experiments over various imbalanced medical datasets.

CONCLUSION

Based on the minority sample density analysis, we propose RSMOTE method that divides the minority sample domain into four regions. The proposed RSMOTE includes four re-sampling methods that each of them carries out resampling on a specific region. According to the experimental results, resampling on the regions with high minority sample density obtained better results while those with lower minority sample density got the inferior results. Thus, we conclude that the RSMOTE is a more flexible resampling method for the imbalanced medical datasets that is capable of generating samples with various minority sample densities.

摘要

引言

医学诊断是患者治疗的关键步骤。然而,由于数据集不平衡,诊断容易产生偏差。为了克服数据集不平衡问题,人们提出了简单少数类过采样技术(SMOTE),该技术可以在数据层面生成新的合成样本,以实现少数类和多数类之间的平衡。然而,合成样本是随机生成的,这会导致类混合问题;从而导致分类性能下降和诊断偏差。

目的

为了克服SMOTE的缺点,人们提出了一些改进方法,试图沿着选定的少数类样本的线段生成合成样本。这些方法大多采用两种策略之一来选择少数类样本以生成合成样本:边界区域采样或安全区域采样。然而,它们都存在过度泛化问题。我们提出了一种基于SMOTE的改进重采样方法,称为RSMOTE,以缓解医学不平衡数据集问题。我们进行了深入分析,并验证了RSMOTE在不平衡医学数据集上的性能。

方法

在本文中,提出的RSMOTE基于少数类样本密度分析,将少数类样本域划分为四个区域(正常、半正常、半临界和临界)。RSMOTE全局发现少数类样本区域,并在特定的一组样本附近进行重采样。

结果

我们的分析和实验验证了,如果在少数类样本密度高的区域生成合成样本,由于类混合风险低,分类性能将得到提高。与一些安全区域方法不同,RSMOTE在全局基础上确定少数类样本的区域,从而消除了过度泛化问题。考虑使用经典和附加评估指标来衡量改进方法的有效性:召回率、误报率、精度、F值、ROC面积和平均综合指标。我们在各种不平衡医学数据集上进行了实验。

结论

基于少数类样本密度分析,我们提出了RSMOTE方法,该方法将少数类样本域划分为四个区域。提出的RSMOTE包括四种重采样方法,每种方法都在特定区域进行重采样。根据实验结果,在少数类样本密度高的区域进行重采样得到了更好的结果,而在少数类样本密度低的区域得到了较差的结果。因此,我们得出结论,RSMOTE是一种更灵活的针对不平衡医学数据集的重采样方法,能够生成具有各种少数类样本密度的样本。