文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估和缓解机器学习中类不平衡的影响及其在 X 射线成像中的应用。

Assessing and mitigating the effects of class imbalance in machine learning with application to X-ray imaging.

机构信息

Department of Medical Imaging, University of Toronto, Toronto, ON, M5T 1W7, Canada.

Department of Mathematics, Statistics and Computer Science, St Francis Xavier University, Antigonish, NS, Canada.

出版信息

Int J Comput Assist Radiol Surg. 2020 Dec;15(12):2041-2048. doi: 10.1007/s11548-020-02260-6. Epub 2020 Sep 23.


DOI:10.1007/s11548-020-02260-6
PMID:32965624
Abstract

PURPOSE: Machine learning (ML) algorithms are well known to exhibit variations in prediction accuracy when provided with imbalanced training sets typically seen in medical imaging (MI) due to the imbalanced ratio of pathological and normal cases. This paper presents a thorough investigation of the effects of class imbalance and methods for mitigating class imbalance in ML algorithms applied to MI. METHODS: We first selected five classes from the Image Retrieval in Medical Applications (IRMA) dataset, performed multiclass classification using the random forest model (RFM), and then performed binary classification using convolutional neural network (CNN) on a chest X-ray dataset. An imbalanced class was created in the training set by varying the number of images in that class. Methods tested to mitigate class imbalance included oversampling, undersampling, and changing class weights of the RFM. Model performance was assessed by overall classification accuracy, overall F1 score, and specificity, recall, and precision of the imbalanced class. RESULTS: A close-to-balanced training set resulted in the best model performance, and a large imbalance with overrepresentation was more detrimental to model performance than underrepresentation. Oversampling and undersampling methods were both effective in mitigating class imbalance, and efficacy of oversampling techniques was class specific. CONCLUSION: This study systematically demonstrates the effect of class imbalance on two public X-ray datasets on RFM and CNN, making these findings widely applicable as a reference. Furthermore, the methods employed here can guide researchers in assessing and addressing the effects of class imbalance, while considering the data-specific characteristics to optimize imbalance mitigating methods.

摘要

目的:机器学习 (ML) 算法在处理医学影像 (MI) 中常见的不平衡训练集时,由于病理和正常病例的不平衡比例,其预测准确性会出现变化,这是众所周知的。本文全面研究了不平衡类对 ML 算法在 MI 中应用的影响以及减轻不平衡类的方法。

方法:我们首先从图像检索在医学应用 (IRMA) 数据集中选择了五个类别,使用随机森林模型 (RFM) 进行多类别分类,然后在胸部 X 射线数据集上使用卷积神经网络 (CNN) 进行二进制分类。通过改变该类别的图像数量,在训练集中创建了一个不平衡类。为了减轻不平衡类的影响,我们测试了过采样、欠采样和改变 RFM 类权重的方法。模型性能通过整体分类准确性、整体 F1 得分以及不平衡类的特异性、召回率和精度来评估。

结果:接近平衡的训练集产生了最佳的模型性能,而过大的不平衡和过表示比欠表示对模型性能的影响更大。过采样和欠采样方法都能有效地减轻不平衡类的影响,并且过采样技术的效果是特定于类别的。

结论:本研究系统地展示了不平衡类对 RFM 和 CNN 两个公共 X 射线数据集的影响,这些发现具有广泛的适用性,可作为参考。此外,这里采用的方法可以指导研究人员评估和解决不平衡类的影响,同时考虑数据的特定特征,以优化不平衡缓解方法。

相似文献

[1]
Assessing and mitigating the effects of class imbalance in machine learning with application to X-ray imaging.

Int J Comput Assist Radiol Surg. 2020-12

[2]
A systematic study of the class imbalance problem in convolutional neural networks.

Neural Netw. 2018-7-29

[3]
Batch-balanced focal loss: a hybrid solution to class imbalance in deep learning.

J Med Imaging (Bellingham). 2023-9

[4]
SVD-CLAHE boosting and balanced loss function for Covid-19 detection from an imbalanced Chest X-Ray dataset.

Comput Biol Med. 2022-11

[5]
Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.

J Cheminform. 2020-10-27

[6]
Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data.

Neuroimage. 2023-8-15

[7]
Conversion of adverse data corpus to shrewd output using sampling metrics.

Vis Comput Ind Biomed Art. 2020-8-11

[8]
Comparative Studies on Resampling Techniques in Machine Learning and Deep Learning Models for Drug-Target Interaction Prediction.

Molecules. 2023-2-9

[9]
Quantifying uncertainty in machine learning classifiers for medical imaging.

Int J Comput Assist Radiol Surg. 2022-4

[10]
Addressing class imbalance in deep learning for small lesion detection on medical images.

Comput Biol Med. 2020-5

引用本文的文献

[1]
Tailoring task arithmetic to address bias in models trained on multi-institutional datasets.

J Biomed Inform. 2025-8

[2]
Predicting and interpreting key features of refractory Mycoplasma pneumoniae pneumonia using multiple machine learning methods.

Sci Rep. 2025-5-23

[3]
A multicenter validation and calibration of automated software package for detecting anterior circulation large vessel occlusion on CT angiography.

BMC Neurol. 2025-3-10

[4]
An automatic deep-learning approach for the prediction of post-stroke epilepsy after an initial intracerebral hemorrhage based on non-contrast computed tomography imaging.

Quant Imaging Med Surg. 2025-2-1

[5]
MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed.

J Biomed Semantics. 2024-10-2

[6]
Optimizing Rare Disease Gait Classification through Data Balancing and Generative AI: Insights from Hereditary Cerebellar Ataxia.

Sensors (Basel). 2024-6-3

[7]
A novel generative adversarial networks modelling for the class imbalance problem in high dimensional omics data.

BMC Med Inform Decis Mak. 2024-3-28

[8]
Dataset meta-level and statistical features affect machine learning performance.

Sci Rep. 2024-1-19

[9]
Backdoor Adjustment of Confounding by Provenance for Robust Text Classification of Multi-institutional Clinical Notes.

AMIA Annu Symp Proc. 2023

[10]
BioBERTurk: Exploring Turkish Biomedical Language Model Development Strategies in Low-Resource Setting.

J Healthc Inform Res. 2023-9-19

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索