• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索医疗保健领域中数据集大小和不平衡对卷积神经网络性能的相互作用:利用X光识别新冠肺炎患者。

Exploring the Interplay of Dataset Size and Imbalance on CNN Performance in Healthcare: Using X-rays to Identify COVID-19 Patients.

作者信息

Davidian Moshe, Lahav Adi, Joshua Ben-Zion, Wand Ori, Lurie Yotam, Mark Shlomo

机构信息

Guilford Glazer Faculty of Business and Management, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel.

Software Engineering Department, SCE-Shamoon College of Engineering, Beer-Sheva 84100, Israel.

出版信息

Diagnostics (Basel). 2024 Aug 8;14(16):1727. doi: 10.3390/diagnostics14161727.

DOI:10.3390/diagnostics14161727
PMID:39202215
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11353409/
Abstract

INTRODUCTION

Convolutional Neural Network (CNN) systems in healthcare are influenced by unbalanced datasets and varying sizes. This article delves into the impact of dataset size, class imbalance, and their interplay on CNN systems, focusing on the size of the training set versus imbalance-a unique perspective compared to the prevailing literature. Furthermore, it addresses scenarios with more than two classification groups, often overlooked but prevalent in practical settings.

METHODS

Initially, a CNN was developed to classify lung diseases using X-ray images, distinguishing between healthy individuals and COVID-19 patients. Later, the model was expanded to include pneumonia patients. To evaluate performance, numerous experiments were conducted with varied data sizes and imbalance ratios for both binary and ternary classifications, measuring various indices to validate the model's efficacy.

RESULTS

The study revealed that increasing dataset size positively impacts CNN performance, but this improvement saturates beyond a certain size. A novel finding is that the data balance ratio influences performance more significantly than dataset size. The behavior of three-class classification mirrored that of binary classification, underscoring the importance of balanced datasets for accurate classification.

CONCLUSIONS

This study emphasizes the fact that achieving balanced representation in datasets is crucial for optimal CNN performance in healthcare, challenging the conventional focus on dataset size. Balanced datasets improve classification accuracy, both in two-class and three-class scenarios, highlighting the need for data-balancing techniques to improve model reliability and effectiveness.

MOTIVATION

Our study is motivated by a scenario with 100 patient samples, offering two options: a balanced dataset with 200 samples and an unbalanced dataset with 500 samples (400 healthy individuals). We aim to provide insights into the optimal choice based on the interplay between dataset size and imbalance, enriching the discourse for stakeholders interested in achieving optimal model performance.

LIMITATIONS

Recognizing a single model's generalizability limitations, we assert that further studies on diverse datasets are needed.

摘要

引言

医疗保健领域的卷积神经网络(CNN)系统受到不平衡数据集和不同规模的影响。本文深入探讨了数据集规模、类别不平衡及其相互作用对CNN系统的影响,重点关注训练集规模与不平衡之间的关系——这是一个与现有文献相比独特的视角。此外,它还探讨了具有两个以上分类组的情况,这种情况在实际应用中经常被忽视但却很普遍。

方法

最初,开发了一个CNN,用于使用X射线图像对肺部疾病进行分类,区分健康个体和新冠肺炎患者。后来,该模型扩展到包括肺炎患者。为了评估性能,针对二分类和三分类,使用不同的数据规模和不平衡率进行了大量实验,测量各种指标以验证模型的有效性。

结果

研究表明,增加数据集规模对CNN性能有积极影响,但这种改进在超过一定规模后会趋于饱和。一个新发现是,数据平衡率比数据集规模对性能的影响更大。三分类的表现与二分类相似,强调了平衡数据集对于准确分类的重要性。

结论

本研究强调了在数据集中实现平衡表示对于医疗保健领域中CNN的最佳性能至关重要,这对传统上对数据集规模的关注提出了挑战。平衡数据集在二分类和三分类场景中都提高了分类准确性,突出了需要数据平衡技术来提高模型的可靠性和有效性。

动机

我们的研究是受一个有100个患者样本的场景驱动的,提供了两个选项:一个有200个样本的平衡数据集和一个有500个样本(400个健康个体)的不平衡数据集。我们旨在基于数据集规模和不平衡之间的相互作用,为最佳选择提供见解,丰富对旨在实现最佳模型性能的利益相关者的讨论。

局限性

认识到单个模型的泛化局限性,我们断言需要对不同的数据集进行进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/846035e3ded9/diagnostics-14-01727-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/7f85c1b0e85e/diagnostics-14-01727-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/ab4c6dabcbd5/diagnostics-14-01727-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/c0859e1944a3/diagnostics-14-01727-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/c7839f2b093d/diagnostics-14-01727-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/e4cd3616a90f/diagnostics-14-01727-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/132ef82d082d/diagnostics-14-01727-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/846035e3ded9/diagnostics-14-01727-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/7f85c1b0e85e/diagnostics-14-01727-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/ab4c6dabcbd5/diagnostics-14-01727-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/c0859e1944a3/diagnostics-14-01727-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/c7839f2b093d/diagnostics-14-01727-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/e4cd3616a90f/diagnostics-14-01727-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/132ef82d082d/diagnostics-14-01727-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/846035e3ded9/diagnostics-14-01727-g007.jpg

相似文献

1
Exploring the Interplay of Dataset Size and Imbalance on CNN Performance in Healthcare: Using X-rays to Identify COVID-19 Patients.探索医疗保健领域中数据集大小和不平衡对卷积神经网络性能的相互作用:利用X光识别新冠肺炎患者。
Diagnostics (Basel). 2024 Aug 8;14(16):1727. doi: 10.3390/diagnostics14161727.
2
Assessing and mitigating the effects of class imbalance in machine learning with application to X-ray imaging.评估和缓解机器学习中类不平衡的影响及其在 X 射线成像中的应用。
Int J Comput Assist Radiol Surg. 2020 Dec;15(12):2041-2048. doi: 10.1007/s11548-020-02260-6. Epub 2020 Sep 23.
3
CNN-Bi-LSTM: A Complex Environment-Oriented Cattle Behavior Classification Network Based on the Fusion of CNN and Bi-LSTM.CNN-Bi-LSTM:一种基于 CNN 和 Bi-LSTM 融合的复杂环境导向牛行为分类网络。
Sensors (Basel). 2023 Sep 6;23(18):7714. doi: 10.3390/s23187714.
4
Batch-balanced focal loss: a hybrid solution to class imbalance in deep learning.批量平衡焦点损失:深度学习中类别不平衡问题的混合解决方案。
J Med Imaging (Bellingham). 2023 Sep;10(5):051809. doi: 10.1117/1.JMI.10.5.051809. Epub 2023 Jun 23.
5
An automated diagnosis and classification of COVID-19 from chest CT images using a transfer learning-based convolutional neural network.利用基于迁移学习的卷积神经网络对 chest CT 图像进行 COVID-19 的自动诊断和分类。
Comput Biol Med. 2022 May;144:105383. doi: 10.1016/j.compbiomed.2022.105383. Epub 2022 Mar 10.
6
Classification of COVID-19 chest X-Ray and CT images using a type of dynamic CNN modification method.利用一种动态卷积神经网络改进方法对 COVID-19 胸部 X 射线和 CT 图像进行分类。
Comput Biol Med. 2021 Jul;134:104425. doi: 10.1016/j.compbiomed.2021.104425. Epub 2021 Apr 29.
7
A hybrid feature weighted attention based deep learning approach for an intrusion detection system using the random forest algorithm.基于混合特征加权注意力的深度学习方法与随机森林算法在入侵检测系统中的应用。
PLoS One. 2024 May 23;19(5):e0302294. doi: 10.1371/journal.pone.0302294. eCollection 2024.
8
COVID-19 lateral flow test image classification using deep CNN and StyleGAN2.使用深度卷积神经网络(CNN)和风格生成对抗网络2(StyleGAN2)对新冠病毒(COVID-19)侧向流动检测图像进行分类
Front Artif Intell. 2024 Jan 29;6:1235204. doi: 10.3389/frai.2023.1235204. eCollection 2023.
9
Application of high resolution computed tomography image assisted classification model of middle ear diseases based on 3D-convolutional neural network.基于 3D 卷积神经网络的中耳疾病高分辨率 CT 图像辅助分类模型的应用。
Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2022 Aug 28;47(8):1037-1048. doi: 10.11817/j.issn.1672-7347.2022.210704.
10
SVD-CLAHE boosting and balanced loss function for Covid-19 detection from an imbalanced Chest X-Ray dataset.SVD-CLAHE 增强和平衡损失函数在不平衡的 Chest X-Ray 数据集上进行 Covid-19 检测。
Comput Biol Med. 2022 Nov;150:106092. doi: 10.1016/j.compbiomed.2022.106092. Epub 2022 Sep 28.

引用本文的文献

1
DCNN models with post-hoc interpretability for the automated detection of glossitis and OSCC on the tongue.具有事后可解释性的深度卷积神经网络模型用于舌部舌炎和口腔鳞状细胞癌的自动检测。
Sci Rep. 2025 Aug 29;15(1):31940. doi: 10.1038/s41598-025-16760-5.

本文引用的文献

1
A Sustainable Approach to Asthma Diagnosis: Classification with Data Augmentation, Feature Selection, and Boosting Algorithm.一种可持续的哮喘诊断方法:通过数据增强、特征选择和提升算法进行分类
Diagnostics (Basel). 2024 Mar 29;14(7):723. doi: 10.3390/diagnostics14070723.
2
Dermo-Seg: ResNet-UNet Architecture and Hybrid Loss Function for Detection of Differential Patterns to Diagnose Pigmented Skin Lesions.皮肤病变分割:用于检测鉴别模式以诊断色素性皮肤病变的ResNet-UNet架构和混合损失函数
Diagnostics (Basel). 2023 Sep 12;13(18):2924. doi: 10.3390/diagnostics13182924.
3
A New Weighted Deep Learning Feature Using Particle Swarm and Ant Lion Optimization for Cervical Cancer Diagnosis on Pap Smear Images.
一种基于粒子群和蚁狮优化的加权深度学习特征用于巴氏涂片图像的宫颈癌诊断
Diagnostics (Basel). 2023 Aug 25;13(17):2762. doi: 10.3390/diagnostics13172762.
4
Thoracic imaging tests for the diagnosis of COVID-19.用于 COVID-19 诊断的胸部影像学检查。
Cochrane Database Syst Rev. 2022 May 16;5(5):CD013639. doi: 10.1002/14651858.CD013639.pub5.
5
Diagnostics for COVID-19: moving from pandemic response to control.COVID-19 诊断:从大流行应对转向控制。
Lancet. 2022 Feb 19;399(10326):757-768. doi: 10.1016/S0140-6736(21)02346-1. Epub 2021 Dec 20.
6
Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models.数据集大小和交互作用对逻辑回归和深度学习模型预测性能的影响。
Comput Methods Programs Biomed. 2022 Jan;213:106504. doi: 10.1016/j.cmpb.2021.106504. Epub 2021 Oct 28.
7
A new approach for computer-aided detection of coronavirus (COVID-19) from CT and X-ray images using machine learning methods.一种使用机器学习方法从CT和X光图像中进行冠状病毒(COVID-19)计算机辅助检测的新方法。
Appl Soft Comput. 2021 Jul;105:107323. doi: 10.1016/j.asoc.2021.107323. Epub 2021 Mar 17.
8
COVID-19 Detection from Chest X-ray Images Using Feature Fusion and Deep Learning.基于特征融合和深度学习的胸部 X 射线图像 COVID-19 检测。
Sensors (Basel). 2021 Feb 20;21(4):1480. doi: 10.3390/s21041480.
9
Effectiveness of COVID-19 diagnosis and management tools: A review.COVID-19 诊断和管理工具的有效性:综述。
Radiography (Lond). 2021 May;27(2):682-687. doi: 10.1016/j.radi.2020.09.010. Epub 2020 Sep 21.
10
Active case finding with case management: the key to tackling the COVID-19 pandemic.主动病例发现与病例管理:应对 COVID-19 大流行的关键。
Lancet. 2020 Jul 4;396(10243):63-70. doi: 10.1016/S0140-6736(20)31278-2. Epub 2020 Jun 4.