Suppr超能文献

探索批量大小对用于疟疾检测的深度学习人工智能模型的影响。

Exploring the Impact of Batch Size on Deep Learning Artificial Intelligence Models for Malaria Detection.

作者信息

Muralidhar Rohit, Demory Michelle L, Kesselman Marc M

机构信息

Medicine, Nova Southeastern University Dr. Kiran C. Patel College of Osteopathic Medicine, Fort Lauderdale, USA.

Medical Education, Nova Southeastern University Dr. Kiran C. Patel College of Allopathic Medicine, Fort Lauderdale, USA.

出版信息

Cureus. 2024 May 13;16(5):e60224. doi: 10.7759/cureus.60224. eCollection 2024 May.

Abstract

Introduction Malaria is a major public health concern, especially in developing countries. Malaria often presents with recurrent fever, malaise, and other nonspecific symptoms mistaken for influenza. Light microscopy of peripheral blood smears is considered the gold standard diagnostic test for malaria. Delays in malaria diagnosis can increase morbidity and mortality. Microscopy can be time-consuming and limited by skilled labor, infrastructure, and interobserver variability. Artificial intelligence (AI)-based tools for diagnostic screening can automate blood smear analysis without relying on a trained technician. Convolutional neural networks (CNN), deep learning neural networks that can identify visual patterns, are being explored for use in abnormality detection in medical images. A parameter that can be optimized in CNN models is the batch size or the number of images used during model training at once in one forward and backward pass. The choice of batch size in developing CNN-based malaria screening tools can affect model accuracy, training speed, and, ultimately, clinical usability. This study explores the impact of batch size on CNN model accuracy for malaria detection from thin blood smear images. Methods We used the publicly available "NIH-NLM-ThinBloodSmearsPf" dataset from the United States National Library of Medicine, consisting of blood smear images for Plasmodium falciparum. The collection consists of 13,779 "parasitized" and 13,779 "uninfected" single-cell images. We created four datasets containing all images, each with unique randomized subsets of images for model testing. Using Python, four identical 10-layer CNN models were developed and trained with varying batch sizes for 10 epochs against all datasets, resulting in 16 sets of outputs. Model prediction accuracy, training time, and F1-score, an accuracy metric used to quantify model performance, were collected. Results All models produced F1-scores of 94%-96%, with 10 of 16 instances producing F1-scores of 95%. After averaging all four dataset outputs by batch size, we observed that, as batch size increased from 16 to 128, the average combined false positives plus false negatives increased by 15.4% (130-150), and the average model F1-score accuracy decreased by 1% (95.3%-94.3%). The average training time also decreased by 28.11% (1,556-1,119 seconds). Conclusion In each dataset, we observe an approximately 1% decrease in F1-score as the batch size was increased. Clinically, a 1% deviation at the population level can create a relatively significant impact on outcomes. Results from this study suggest that smaller batch sizes could improve accuracy in models with similar layer complexity and datasets, potentially resulting in better clinical outcomes. Reduced memory requirement for training also means that model training can be achieved with more economical hardware. Our findings suggest that smaller batch sizes could be evaluated for improvements in accuracy to help develop an AI model that could screen thin blood smears for malaria.

摘要

引言

疟疾是一个主要的公共卫生问题,在发展中国家尤为如此。疟疾常常表现为反复发热、身体不适以及其他被误认为是流感的非特异性症状。外周血涂片的光学显微镜检查被认为是疟疾的金标准诊断测试。疟疾诊断的延迟会增加发病率和死亡率。显微镜检查可能耗时,并且受到技术人员、基础设施以及观察者间差异的限制。基于人工智能(AI)的诊断筛查工具可以在不依赖训练有素的技术人员的情况下实现血涂片分析的自动化。卷积神经网络(CNN)是一种能够识别视觉模式的深度学习神经网络,正被探索用于医学图像中的异常检测。在CNN模型中可以优化的一个参数是批量大小,即模型在一次前向和反向传播中同时用于训练的图像数量。在开发基于CNN的疟疾筛查工具时,批量大小的选择会影响模型准确性、训练速度以及最终的临床可用性。本研究探讨了批量大小对基于CNN的疟疾检测模型从薄血涂片图像中检测准确性的影响。

方法

我们使用了美国国立医学图书馆公开可用的“NIH-NLM-ThinBloodSmearsPf”数据集,该数据集由恶性疟原虫的血涂片图像组成。该集合包含13,779张“感染疟原虫”和13,779张“未感染”的单细胞图像。我们创建了四个包含所有图像的数据集,每个数据集都有用于模型测试的独特随机图像子集。使用Python,开发了四个相同的10层CNN模型,并针对所有数据集使用不同的批量大小进行10个轮次的训练,从而产生16组输出。收集了模型预测准确性、训练时间以及F1分数(一种用于量化模型性能的准确性指标)。

结果

所有模型的F1分数均在94% - 96%之间,16个实例中有10个的F1分数为95%。按批量大小对所有四个数据集的输出进行平均后,我们观察到,随着批量大小从16增加到128,平均假阳性和假阴性之和增加了15.4%(从130增加到150),平均模型F1分数准确性下降了1%(从95.3%降至94.3%)。平均训练时间也减少了28.11%(从1,556秒降至1,119秒)。

结论

在每个数据集中,我们观察到随着批量大小的增加,F1分数大约下降1%。在临床上,在人群水平上1%的偏差可能会对结果产生相对显著的影响。本研究结果表明较小的批量大小可以提高具有相似层复杂度和数据集的模型的准确性,可能会带来更好的临床结果。训练所需的内存减少也意味着可以使用更经济的硬件来实现模型训练。我们的研究结果表明,可以评估较小的批量大小以提高准确性,从而有助于开发一种能够筛查薄血涂片以检测疟疾的AI模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/19f6/11167577/6420792cb0ad/cureus-0016-00000060224-i01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验