文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Exploring the Interplay of Dataset Size and Imbalance on CNN Performance in Healthcare: Using X-rays to Identify COVID-19 Patients.

作者信息

Davidian Moshe, Lahav Adi, Joshua Ben-Zion, Wand Ori, Lurie Yotam, Mark Shlomo

机构信息

Guilford Glazer Faculty of Business and Management, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel.

Software Engineering Department, SCE-Shamoon College of Engineering, Beer-Sheva 84100, Israel.

出版信息

Diagnostics (Basel). 2024 Aug 8;14(16):1727. doi: 10.3390/diagnostics14161727.


DOI:10.3390/diagnostics14161727
PMID:39202215
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11353409/
Abstract

INTRODUCTION: Convolutional Neural Network (CNN) systems in healthcare are influenced by unbalanced datasets and varying sizes. This article delves into the impact of dataset size, class imbalance, and their interplay on CNN systems, focusing on the size of the training set versus imbalance-a unique perspective compared to the prevailing literature. Furthermore, it addresses scenarios with more than two classification groups, often overlooked but prevalent in practical settings. METHODS: Initially, a CNN was developed to classify lung diseases using X-ray images, distinguishing between healthy individuals and COVID-19 patients. Later, the model was expanded to include pneumonia patients. To evaluate performance, numerous experiments were conducted with varied data sizes and imbalance ratios for both binary and ternary classifications, measuring various indices to validate the model's efficacy. RESULTS: The study revealed that increasing dataset size positively impacts CNN performance, but this improvement saturates beyond a certain size. A novel finding is that the data balance ratio influences performance more significantly than dataset size. The behavior of three-class classification mirrored that of binary classification, underscoring the importance of balanced datasets for accurate classification. CONCLUSIONS: This study emphasizes the fact that achieving balanced representation in datasets is crucial for optimal CNN performance in healthcare, challenging the conventional focus on dataset size. Balanced datasets improve classification accuracy, both in two-class and three-class scenarios, highlighting the need for data-balancing techniques to improve model reliability and effectiveness. MOTIVATION: Our study is motivated by a scenario with 100 patient samples, offering two options: a balanced dataset with 200 samples and an unbalanced dataset with 500 samples (400 healthy individuals). We aim to provide insights into the optimal choice based on the interplay between dataset size and imbalance, enriching the discourse for stakeholders interested in achieving optimal model performance. LIMITATIONS: Recognizing a single model's generalizability limitations, we assert that further studies on diverse datasets are needed.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/846035e3ded9/diagnostics-14-01727-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/7f85c1b0e85e/diagnostics-14-01727-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/ab4c6dabcbd5/diagnostics-14-01727-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/c0859e1944a3/diagnostics-14-01727-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/c7839f2b093d/diagnostics-14-01727-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/e4cd3616a90f/diagnostics-14-01727-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/132ef82d082d/diagnostics-14-01727-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/846035e3ded9/diagnostics-14-01727-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/7f85c1b0e85e/diagnostics-14-01727-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/ab4c6dabcbd5/diagnostics-14-01727-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/c0859e1944a3/diagnostics-14-01727-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/c7839f2b093d/diagnostics-14-01727-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/e4cd3616a90f/diagnostics-14-01727-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/132ef82d082d/diagnostics-14-01727-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a65f/11353409/846035e3ded9/diagnostics-14-01727-g007.jpg

相似文献

[1]
Exploring the Interplay of Dataset Size and Imbalance on CNN Performance in Healthcare: Using X-rays to Identify COVID-19 Patients.

Diagnostics (Basel). 2024-8-8

[2]
Assessing and mitigating the effects of class imbalance in machine learning with application to X-ray imaging.

Int J Comput Assist Radiol Surg. 2020-12

[3]
CNN-Bi-LSTM: A Complex Environment-Oriented Cattle Behavior Classification Network Based on the Fusion of CNN and Bi-LSTM.

Sensors (Basel). 2023-9-6

[4]
Batch-balanced focal loss: a hybrid solution to class imbalance in deep learning.

J Med Imaging (Bellingham). 2023-9

[5]
An automated diagnosis and classification of COVID-19 from chest CT images using a transfer learning-based convolutional neural network.

Comput Biol Med. 2022-5

[6]
Classification of COVID-19 chest X-Ray and CT images using a type of dynamic CNN modification method.

Comput Biol Med. 2021-7

[7]
A hybrid feature weighted attention based deep learning approach for an intrusion detection system using the random forest algorithm.

PLoS One. 2024

[8]
COVID-19 lateral flow test image classification using deep CNN and StyleGAN2.

Front Artif Intell. 2024-1-29

[9]
Application of high resolution computed tomography image assisted classification model of middle ear diseases based on 3D-convolutional neural network.

Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2022-8-28

[10]
SVD-CLAHE boosting and balanced loss function for Covid-19 detection from an imbalanced Chest X-Ray dataset.

Comput Biol Med. 2022-11

引用本文的文献

[1]
DCNN models with post-hoc interpretability for the automated detection of glossitis and OSCC on the tongue.

Sci Rep. 2025-8-29

本文引用的文献

[1]
A Sustainable Approach to Asthma Diagnosis: Classification with Data Augmentation, Feature Selection, and Boosting Algorithm.

Diagnostics (Basel). 2024-3-29

[2]
Dermo-Seg: ResNet-UNet Architecture and Hybrid Loss Function for Detection of Differential Patterns to Diagnose Pigmented Skin Lesions.

Diagnostics (Basel). 2023-9-12

[3]
A New Weighted Deep Learning Feature Using Particle Swarm and Ant Lion Optimization for Cervical Cancer Diagnosis on Pap Smear Images.

Diagnostics (Basel). 2023-8-25

[4]
Thoracic imaging tests for the diagnosis of COVID-19.

Cochrane Database Syst Rev. 2022-5-16

[5]
Diagnostics for COVID-19: moving from pandemic response to control.

Lancet. 2022-2-19

[6]
Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models.

Comput Methods Programs Biomed. 2022-1

[7]
A new approach for computer-aided detection of coronavirus (COVID-19) from CT and X-ray images using machine learning methods.

Appl Soft Comput. 2021-7

[8]
COVID-19 Detection from Chest X-ray Images Using Feature Fusion and Deep Learning.

Sensors (Basel). 2021-2-20

[9]
Effectiveness of COVID-19 diagnosis and management tools: A review.

Radiography (Lond). 2021-5

[10]
Active case finding with case management: the key to tackling the COVID-19 pandemic.

Lancet. 2020-6-4

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索