Ravikumar Aswathy, Sriraman Harini
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India.
PeerJ Comput Sci. 2023 Mar 9;9:e1258. doi: 10.7717/peerj-cs.1258. eCollection 2023.
Pneumonia is a respiratory disease caused by bacteria; it affects many people, particularly in impoverished countries where pollution, unclean living standards, overpopulation, and insufficient medical infrastructures are prevalent. To guarantee curative therapy and boost survival chances, it is vital to detect pneumonia soon enough. Imaging using chest X-rays is the most common way of detecting pneumonia. However, analyzing chest X-rays is a complex process vulnerable to subjective variation. Moreover, the data available is growing exponentially, and it will take hours and days to train the model to predict pneumonia. Timely prediction is significant to guarantee a better cure and treatment. Existing work provided by different authors needs more precision, and the computation time for predicting pneumonia is also much longer. Therefore, there is a requirement for early forecasting. Using X-ray picture samples, the system must have a continuous and unsupervised learning system for early diagnosis.
In this article, the training time of the model is accelerated using the distributed data-parallel approach and the computational power of high-performance computing devices. This research aims to diagnose pneumonia using X-ray pictures with more precision, greater speed, and fewer processing resources. Distributed deep learning techniques are gaining popularity owing to the rising need for computational resources for deep learning models with several parameters. In contrast to conventional training methods, data-parallel training enables several compute nodes to train massive deep-learning models to improve training efficiency concurrently. Deploying the model in Spark solves the scalability and acceleration. Spark's distributed processing capability reads data from multiple nodes, and the results demonstrate that training time can be drastically reduced by utilizing these techniques, which is a significant necessity when dealing with large datasets.
The proposed model makes the prediction 1.5 times faster than the traditional CNN model used for pneumonia prediction. The model also achieved an accuracy of 98.72%. The speed-up varying from 1.2 to 1.5 was obtained in the synchronous and asynchronous parallel model. The speed-up is reduced in the parallel asynchronous model due to the presence of straggler nodes.
肺炎是一种由细菌引起的呼吸道疾病;它影响着许多人,尤其是在贫困国家,那里污染严重、生活环境不卫生、人口过剩且医疗基础设施不足。为了确保有效的治疗并提高生存几率,尽早检测出肺炎至关重要。使用胸部X光进行成像检查是检测肺炎最常用的方法。然而,分析胸部X光片是一个复杂的过程,容易受到主观差异的影响。此外,可用数据呈指数级增长,训练模型来预测肺炎需要数小时甚至数天时间。及时预测对于确保更好的治疗效果具有重要意义。不同作者开展的现有工作需要更高的精度,且预测肺炎的计算时间也长得多。因此,有必要进行早期预测。利用X光图像样本,该系统必须具备一个用于早期诊断的持续且无监督的学习系统。
在本文中,使用分布式数据并行方法和高性能计算设备的计算能力来加速模型的训练时间。本研究旨在利用X光图像更精确、更快且以更少的处理资源来诊断肺炎。由于具有多个参数的深度学习模型对计算资源的需求不断增加,分布式深度学习技术越来越受欢迎。与传统训练方法相比,数据并行训练使多个计算节点能够同时训练大规模深度学习模型,从而提高训练效率。在Spark中部署模型可解决可扩展性和加速问题。Spark的分布式处理能力可从多个节点读取数据,结果表明利用这些技术可大幅减少训练时间,这在处理大型数据集时非常必要。
所提出的模型进行预测的速度比用于肺炎预测的传统卷积神经网络(CNN)模型快1.5倍。该模型还实现了98.72%的准确率。在同步和异步并行模型中获得了1.2至1.5的加速比。由于存在掉队节点,并行异步模型中的加速比有所降低。