• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于组合权重的多类不平衡数据过采样方法。

An oversampling method for multi-class imbalanced data based on composite weights.

机构信息

School of Automobile, Chang'an University, Xi'an, China.

College of Automobile Engineering, College of Humanities and Information Changchun University of Technology, Changchun, China.

出版信息

PLoS One. 2021 Nov 12;16(11):e0259227. doi: 10.1371/journal.pone.0259227. eCollection 2021.

DOI:10.1371/journal.pone.0259227
PMID:34767567
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8589211/
Abstract

To solve the oversampling problem of multi-class small samples and to improve their classification accuracy, we develop an oversampling method based on classification ranking and weight setting. The designed oversampling algorithm sorts the data within each class of dataset according to the distance from original data to the hyperplane. Furthermore, iterative sampling is performed within the class and inter-class sampling is adopted at the boundaries of adjacent classes according to the sampling weight composed of data density and data sorting. Finally, information assignment is performed on all newly generated sampling data. The training and testing experiments of the algorithm are conducted by using the UCI imbalanced datasets, and the established composite metrics are used to evaluate the performance of the proposed algorithm and other algorithms in comprehensive evaluation method. The results show that the proposed algorithm makes the multi-class imbalanced data balanced in terms of quantity, and the newly generated data maintain the distribution characteristics and information properties of the original samples. Moreover, compared with other algorithms such as SMOTE and SVMOM, the proposed algorithm has reached a higher classification accuracy of about 90%. It is concluded that this algorithm has high practicability and general characteristics for imbalanced multi-class samples.

摘要

为了解决多类小样本的过采样问题,提高其分类精度,我们开发了一种基于分类排序和权重设置的过采样方法。所设计的过采样算法根据数据点到超平面的距离对数据集的每个类内的数据进行排序。此外,根据由数据密度和数据排序组成的采样权重,在类内进行迭代采样,并在相邻类的边界处进行类间采样。最后,对所有新生成的采样数据进行信息赋值。通过 UCI 不平衡数据集对算法进行训练和测试实验,并使用建立的综合指标对算法和其他算法在综合评价方法中的性能进行评估。结果表明,该算法在数量上使多类不平衡数据达到平衡,新生成的数据保持了原始样本的分布特征和信息特征。此外,与 SMOTE 和 SVMOM 等其他算法相比,该算法的分类精度约达到 90%,这表明该算法对于不平衡的多类样本具有较高的实用性和通用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/16992f6d5f32/pone.0259227.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/b1ac238bd788/pone.0259227.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/19402f6f31ea/pone.0259227.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/7d21e90cbbbd/pone.0259227.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/a5848f539545/pone.0259227.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/f886cbc99e53/pone.0259227.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/6d1b95e67848/pone.0259227.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/16992f6d5f32/pone.0259227.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/b1ac238bd788/pone.0259227.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/19402f6f31ea/pone.0259227.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/7d21e90cbbbd/pone.0259227.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/a5848f539545/pone.0259227.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/f886cbc99e53/pone.0259227.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/6d1b95e67848/pone.0259227.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/8589211/16992f6d5f32/pone.0259227.g007.jpg

相似文献

1
An oversampling method for multi-class imbalanced data based on composite weights.基于组合权重的多类不平衡数据过采样方法。
PLoS One. 2021 Nov 12;16(11):e0259227. doi: 10.1371/journal.pone.0259227. eCollection 2021.
2
Distance Metric Based Oversampling Method for Bioinformatics and Performance Evaluation.基于距离度量的生物信息学过采样方法及性能评估
J Med Syst. 2016 Jul;40(7):159. doi: 10.1007/s10916-016-0516-3. Epub 2016 May 16.
3
Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification.基于自适应群体聚类的动态多目标合成少数类过采样技术算法,用于处理生物医学数据分类中的二元不平衡数据集。
BioData Min. 2016 Dec 1;9:37. doi: 10.1186/s13040-016-0117-1. eCollection 2016.
4
CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.基于随机森林的用于特征选择和参数优化的CURE-SMOTE算法及混合算法。
BMC Bioinformatics. 2017 Mar 14;18(1):169. doi: 10.1186/s12859-017-1578-z.
5
A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data.基于随机森林的 M-SMOTE 与ENN 混合采样算法在医学不平衡数据中的应用
J Biomed Inform. 2020 Jul;107:103465. doi: 10.1016/j.jbi.2020.103465. Epub 2020 Jun 5.
6
RSMOTE: improving classification performance over imbalanced medical datasets.RSMOTE:提升不平衡医学数据集的分类性能
Health Inf Sci Syst. 2020 Jun 12;8(1):22. doi: 10.1007/s13755-020-00112-w. eCollection 2020 Dec.
7
Outlier-SMOTE: A refined oversampling technique for improved detection of COVID-19.异常值合成少数过采样技术(Outlier-SMOTE):一种用于改进新冠病毒(COVID-19)检测的精细过采样技术。
Intell Based Med. 2020 Dec;3:100023. doi: 10.1016/j.ibmed.2020.100023. Epub 2020 Dec 3.
8
A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification.一种基于高斯混合模型滤波的合成少数类过采样技术用于不平衡数据分类
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3740-3753. doi: 10.1109/TNNLS.2022.3197156. Epub 2024 Feb 29.
9
A multiple combined method for rebalancing medical data with class imbalances.一种用于平衡具有类别不平衡的医学数据的多重组合方法。
Comput Biol Med. 2021 Jul;134:104527. doi: 10.1016/j.compbiomed.2021.104527. Epub 2021 May 31.
10
An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data.一种有效的算法与合成少数过采样技术相结合,用于对不平衡的 PubChem BioAssay 数据进行分类。
Anal Chim Acta. 2014 Jan 2;806:117-27. doi: 10.1016/j.aca.2013.10.050. Epub 2013 Nov 6.

引用本文的文献

1
A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems.多类不平衡数据集分类中的过采样技术综述:对医学问题的见解
Front Digit Health. 2024 Jul 26;6:1430245. doi: 10.3389/fdgth.2024.1430245. eCollection 2024.
2
Association of medial collateral ligament complex injuries with anterior cruciate ligament ruptures based on posterolateral tibial plateau injuries.基于胫骨后外侧平台损伤的内侧副韧带复合体损伤与前交叉韧带断裂的相关性
Sports Med Open. 2023 Aug 8;9(1):70. doi: 10.1186/s40798-023-00611-6.
3
SMOTE-CD: SMOTE for compositional data.

本文引用的文献

1
Wind disasters adaptation in cities in a changing climate: A systematic review.气候变化下城市的防风灾适应性:系统综述
PLoS One. 2021 Mar 17;16(3):e0248503. doi: 10.1371/journal.pone.0248503. eCollection 2021.
2
Hashing-Based Undersampling Ensemble for Imbalanced Pattern Classification Problems.基于哈希的欠采样集成方法在不平衡模式分类问题中的应用。
IEEE Trans Cybern. 2022 Feb;52(2):1269-1279. doi: 10.1109/TCYB.2020.3000754. Epub 2022 Feb 16.
3
A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data.
SMOTE-CD:针对组合数据的 SMOTE 方法。
PLoS One. 2023 Jun 29;18(6):e0287705. doi: 10.1371/journal.pone.0287705. eCollection 2023.
4
Automatic Clustering and Classification of Coffee Leaf Diseases Based on an Extended Kernel Density Estimation Approach.基于扩展核密度估计方法的咖啡叶病害自动聚类与分类
Plants (Basel). 2023 Apr 10;12(8):1603. doi: 10.3390/plants12081603.
基于随机森林的 M-SMOTE 与ENN 混合采样算法在医学不平衡数据中的应用
J Biomed Inform. 2020 Jul;107:103465. doi: 10.1016/j.jbi.2020.103465. Epub 2020 Jun 5.
4
Glycemic-aware metrics and oversampling techniques for predicting blood glucose levels using machine learning.基于机器学习的血糖感知指标和过采样技术在血糖预测中的应用。
PLoS One. 2019 Dec 2;14(12):e0225613. doi: 10.1371/journal.pone.0225613. eCollection 2019.
5
Public attitudes and literacy about posttraumatic stress disorder in U.S. adults.美国成年人的创伤后应激障碍的公众态度和知识水平。
J Anxiety Disord. 2018 Apr;55:63-69. doi: 10.1016/j.janxdis.2018.02.002. Epub 2018 Feb 26.
6
Exploratory undersampling for class-imbalance learning.用于类别不平衡学习的探索性欠采样
IEEE Trans Syst Man Cybern B Cybern. 2009 Apr;39(2):539-50. doi: 10.1109/TSMCB.2008.2007853. Epub 2008 Dec 16.
7
An improved algorithm for neural network classification of imbalanced training sets.一种用于不平衡训练集神经网络分类的改进算法。
IEEE Trans Neural Netw. 1993;4(6):962-9. doi: 10.1109/72.286891.