BRAC University, Dhaka, Bangladesh.
University of Liberal Arts Bangladesh, Dhaka, Bangladesh.
PLoS One. 2024 Sep 27;19(9):e0303541. doi: 10.1371/journal.pone.0303541. eCollection 2024.
Ph-negative Myeloproliferative Neoplasm is a rare yet dangerous disease that can turn into more severe forms of disorders later on. Clinical diagnosis of the disease exists but often requires collecting multiple types of pathologies which can be tedious and time-consuming. Meanwhile, studies on deep learning-based research are rare and often need to rely on a small amount of pathological data due to the rarity of the disease. In addition, the existing research works do not address the data scarcity issue apart from using common techniques like data augmentation, which leaves room for performance improvement. To tackle the issue, the proposed research aims to utilize distilled knowledge learned from a larger dataset to boost the performance of a lightweight model trained on a small MPN dataset. Firstly, a 50-layer ResNet model is trained on a large lymph node image dataset of 3,27,680 images, followed by the trained knowledge being distilled to a small 4-layer CNN model. Afterward, the CNN model is initialized with the pre-trained weights to further train on a small MPN dataset of 300 images. Empirical analysis showcases that the CNN with distilled knowledge achieves 97% accuracy compared to 89.67% accuracy achieved by a clone CNN trained from scratch. The distilled knowledge transfer approach also proves to be more effective than more simple data scarcity handling approaches such as augmentation and manual feature extraction. Overall, the research affirms the effectiveness of transferring distilled knowledge to address the data scarcity issue and achieves better convergence when training on a Ph-Negative MPN image dataset with a lightweight model.
Ph-阴性骨髓增殖性肿瘤是一种罕见但危险的疾病,以后可能会发展成更严重的疾病。目前已经存在对这种疾病的临床诊断方法,但通常需要收集多种类型的病理,这可能很繁琐和耗时。同时,基于深度学习的研究很少,而且由于疾病的罕见性,通常需要依赖少量的病理数据。此外,现有的研究工作除了使用数据增强等常见技术外,并没有解决数据稀缺的问题,这为性能的提高留下了空间。为了解决这个问题,拟议的研究旨在利用从更大的数据集中学到的知识,来提高在较小的 MPN 数据集上训练的轻量级模型的性能。首先,在一个包含 327680 张图像的大型淋巴结图像数据集上训练一个 50 层的 ResNet 模型,然后将训练好的知识蒸馏到一个只有 4 层的 CNN 模型中。然后,用预先训练好的权重初始化 CNN 模型,使其在一个包含 300 张图像的小型 MPN 数据集上进一步训练。实验分析表明,与从零开始训练的克隆 CNN 相比,具有蒸馏知识的 CNN 达到了 97%的准确率,而克隆 CNN 仅达到 89.67%的准确率。蒸馏知识迁移方法也被证明比数据增强和手动特征提取等更简单的数据稀缺处理方法更有效。总的来说,该研究证实了转移蒸馏知识以解决数据稀缺问题的有效性,并在使用轻量级模型对 Ph-阴性 MPN 图像数据集进行训练时实现了更好的收敛。