Chowa Sadia Sultana, Bhuiyan Md Rahad Islam, Tahosin Mst Sazia, Karim Asif, Montaha Sidratul, Hassan Md Mehedi, Shah Mohd Asif, Azam Sami
Faculty of Science and Technology, Charles Darwin University, Casuarina, NT, 0909, Australia.
Health Informatics Research Laboratory (HIRL), Department of Computer Science and Engineering, Daffodil International University, Dhaka, 1341, Bangladesh.
Sci Rep. 2025 Jan 2;15(1):226. doi: 10.1038/s41598-024-83972-6.
This study presents a novel privacy-preserving self-supervised (SSL) framework for COVID-19 classification from lung CT scans, utilizing federated learning (FL) enhanced with Paillier homomorphic encryption (PHE) to prevent third-party attacks during training. The FL-SSL based framework employs two publicly available lung CT scan datasets which are considered as labeled and an unlabeled dataset. The unlabeled dataset is split into three subsets which are assumed to be collected from three hospitals. Training is done using the Bootstrap Your Own Latent (BYOL) contrastive learning SSL framework with a VGG19 encoder followed by attention CNN blocks (VGG19 + attention CNN). The input datasets are processed by selecting the largest lung portion of each lung CT scan using an automated selection approach and a 64 × 64 input size is utilized to reduce computational complexity. Healthcare privacy issues are addressed by collaborative training across decentralized datasets and secure aggregation with PHE, underscoring the effectiveness of this approach. Three subsets of the dataset are used to train the local BYOL model, which together optimizes the central encoder. The labeled dataset is employed to train the central encoder (updated VGG19 + attention CNN), resulting in an accuracy of 97.19%, a precision of 97.43%, and a recall of 98.18%. The reliability of the framework's performance is demonstrated through statistical analysis and five-fold cross-validation. The efficacy of the proposed framework is further showcased by showing its performance on three distinct modality datasets: skin cancer, breast cancer, and chest X-rays. In conclusion, this study offers a promising solution for accurate diagnosis of chest X-rays, preserving privacy and overcoming the challenges of dataset scarcity and computational complexity.
本研究提出了一种用于从肺部CT扫描中对新冠肺炎进行分类的新型隐私保护自监督(SSL)框架,该框架利用结合了Paillier同态加密(PHE)的联邦学习(FL)来防止训练期间的第三方攻击。基于FL-SSL的框架使用了两个公开可用的肺部CT扫描数据集(视为有标签数据集)和一个无标签数据集。无标签数据集被分为三个子集,假定是从三家医院收集的。使用带有VGG19编码器和注意力CNN模块(VGG19 + 注意力CNN)的自训练潜在特征(BYOL)对比学习SSL框架进行训练。通过使用自动选择方法选择每个肺部CT扫描的最大肺部区域来处理输入数据集,并采用64×64的输入大小以降低计算复杂度。通过跨分散数据集的协作训练和使用PHE的安全聚合来解决医疗隐私问题,突出了该方法的有效性。数据集的三个子集用于训练本地BYOL模型,这些模型共同优化中央编码器。使用有标签数据集训练中央编码器(更新后的VGG19 + 注意力CNN),准确率达到97.19%,精确率达到97.43%,召回率达到98.18%。通过统计分析和五折交叉验证证明了该框架性能的可靠性。通过展示其在三个不同模态数据集(皮肤癌、乳腺癌和胸部X光)上的性能,进一步证明了所提出框架的有效性。总之,本研究为胸部X光的准确诊断提供了一个有前景的解决方案,既能保护隐私,又能克服数据集稀缺和计算复杂度的挑战。