用于COVID-19诊断的胸部X光分析：作为一个启用HPC的医疗诊断支持数据分析和机器学习平台的用例

Analysis of Chest X-ray for COVID-19 Diagnosis as a Use Case for an HPC-Enabled Data Analysis and Machine Learning Platform for Medical Diagnosis Support.

作者信息

Barakat Chadi, Aach Marcel, Schuppert Andreas, Brynjólfsson Sigurður, Fritsch Sebastian, Riedel Morris

机构信息

School of Engineering and Natural Science, University of Iceland, 107 Reykjavik, Iceland.

Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.

出版信息

Diagnostics (Basel). 2023 Jan 20;13(3):391. doi: 10.3390/diagnostics13030391.

DOI:10.3390/diagnostics13030391

PMID:36766496

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9914706/

Abstract

The COVID-19 pandemic shed light on the need for quick diagnosis tools in healthcare, leading to the development of several algorithmic models for disease detection. Though these models are relatively easy to build, their training requires a lot of data, storage, and resources, which may not be available for use by medical institutions or could be beyond the skillset of the people who most need these tools. This paper describes a data analysis and machine learning platform that takes advantage of high-performance computing infrastructure for medical diagnosis support applications. This platform is validated by re-training a previously published deep learning model (COVID-Net) on new data, where it is shown that the performance of the model is improved through large-scale hyperparameter optimisation that uncovered optimal training parameter combinations. The per-class accuracy of the model, especially for COVID-19 and pneumonia, is higher when using the tuned hyperparameters (healthy: 96.5%; pneumonia: 61.5%; COVID-19: 78.9%) as opposed to parameters chosen through traditional methods (healthy: 93.6%; pneumonia: 46.1%; COVID-19: 76.3%). Furthermore, training speed-up analysis shows a major decrease in training time as resources increase, from 207 min using 1 node to 54 min when distributed over 32 nodes, but highlights the presence of a cut-off point where the communication overhead begins to affect performance. The developed platform is intended to provide the medical field with a technical environment for developing novel portable artificial-intelligence-based tools for diagnosis support.

摘要

新冠疫情凸显了医疗保健领域对快速诊断工具的需求，从而催生了多种用于疾病检测的算法模型。尽管这些模型相对容易构建，但其训练需要大量数据、存储和资源，而医疗机构可能无法获取这些资源，或者这些资源超出了最需要这些工具的人员的技能范围。本文介绍了一个数据分析和机器学习平台，该平台利用高性能计算基础设施来支持医疗诊断应用。通过在新数据上重新训练先前发布的深度学习模型（COVID-Net）对该平台进行了验证，结果表明，通过大规模超参数优化发现了最佳训练参数组合，从而提高了模型的性能。与通过传统方法选择的参数（健康：93.6%；肺炎：46.1%；新冠：76.3%）相比，使用调整后的超参数时，模型的每类准确率更高（健康：96.5%；肺炎：61.5%；新冠：78.9%）。此外，训练加速分析表明，随着资源增加，训练时间大幅减少，从使用1个节点时的207分钟降至分布在32个节点时的54分钟，但突出显示了存在一个临界点，此时通信开销开始影响性能。所开发的平台旨在为医疗领域提供一个技术环境，以开发基于人工智能的新型便携式诊断支持工具。