Suppr超能文献

评估和缓解机器学习中类不平衡的影响及其在 X 射线成像中的应用。

Assessing and mitigating the effects of class imbalance in machine learning with application to X-ray imaging.

机构信息

Department of Medical Imaging, University of Toronto, Toronto, ON, M5T 1W7, Canada.

Department of Mathematics, Statistics and Computer Science, St Francis Xavier University, Antigonish, NS, Canada.

出版信息

Int J Comput Assist Radiol Surg. 2020 Dec;15(12):2041-2048. doi: 10.1007/s11548-020-02260-6. Epub 2020 Sep 23.

Abstract

PURPOSE

Machine learning (ML) algorithms are well known to exhibit variations in prediction accuracy when provided with imbalanced training sets typically seen in medical imaging (MI) due to the imbalanced ratio of pathological and normal cases. This paper presents a thorough investigation of the effects of class imbalance and methods for mitigating class imbalance in ML algorithms applied to MI.

METHODS

We first selected five classes from the Image Retrieval in Medical Applications (IRMA) dataset, performed multiclass classification using the random forest model (RFM), and then performed binary classification using convolutional neural network (CNN) on a chest X-ray dataset. An imbalanced class was created in the training set by varying the number of images in that class. Methods tested to mitigate class imbalance included oversampling, undersampling, and changing class weights of the RFM. Model performance was assessed by overall classification accuracy, overall F1 score, and specificity, recall, and precision of the imbalanced class.

RESULTS

A close-to-balanced training set resulted in the best model performance, and a large imbalance with overrepresentation was more detrimental to model performance than underrepresentation. Oversampling and undersampling methods were both effective in mitigating class imbalance, and efficacy of oversampling techniques was class specific.

CONCLUSION

This study systematically demonstrates the effect of class imbalance on two public X-ray datasets on RFM and CNN, making these findings widely applicable as a reference. Furthermore, the methods employed here can guide researchers in assessing and addressing the effects of class imbalance, while considering the data-specific characteristics to optimize imbalance mitigating methods.

摘要

目的

机器学习 (ML) 算法在处理医学影像 (MI) 中常见的不平衡训练集时,由于病理和正常病例的不平衡比例,其预测准确性会出现变化,这是众所周知的。本文全面研究了不平衡类对 ML 算法在 MI 中应用的影响以及减轻不平衡类的方法。

方法

我们首先从图像检索在医学应用 (IRMA) 数据集中选择了五个类别,使用随机森林模型 (RFM) 进行多类别分类,然后在胸部 X 射线数据集上使用卷积神经网络 (CNN) 进行二进制分类。通过改变该类别的图像数量,在训练集中创建了一个不平衡类。为了减轻不平衡类的影响,我们测试了过采样、欠采样和改变 RFM 类权重的方法。模型性能通过整体分类准确性、整体 F1 得分以及不平衡类的特异性、召回率和精度来评估。

结果

接近平衡的训练集产生了最佳的模型性能,而过大的不平衡和过表示比欠表示对模型性能的影响更大。过采样和欠采样方法都能有效地减轻不平衡类的影响,并且过采样技术的效果是特定于类别的。

结论

本研究系统地展示了不平衡类对 RFM 和 CNN 两个公共 X 射线数据集的影响,这些发现具有广泛的适用性,可作为参考。此外,这里采用的方法可以指导研究人员评估和解决不平衡类的影响,同时考虑数据的特定特征,以优化不平衡缓解方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验