Ayana Gelan, Park Jinhyung, Choe Se-Woon
Department of Medical IT Convergence Engineering, Kumoh National Institute of Technology, Gumi 39253, Korea.
Department of IT Convergence Engineering, Kumoh National Institute of Technology, Gumi 39253, Korea.
Cancers (Basel). 2022 Mar 1;14(5):1280. doi: 10.3390/cancers14051280.
Despite great achievements in classifying mammographic breast-mass images via deep-learning (DL), obtaining large amounts of training data and ensuring generalizations across different datasets with robust and well-optimized algorithms remain a challenge. ImageNet-based transfer learning (TL) and patch classifiers have been utilized to address these challenges. However, researchers have been unable to achieve the desired performance for DL to be used as a standalone tool. In this study, we propose a novel multi-stage TL from ImageNet and cancer cell line image pre-trained models to classify mammographic breast masses as either benign or malignant. We trained our model on three public datasets: Digital Database for Screening Mammography (DDSM), INbreast, and Mammographic Image Analysis Society (MIAS). In addition, a mixed dataset of the images from these three datasets was used to train the model. We obtained an average five-fold cross validation AUC of 1, 0.9994, 0.9993, and 0.9998 for DDSM, INbreast, MIAS, and mixed datasets, respectively. Moreover, the observed performance improvement using our method against the patch-based method was statistically significant, with a -value of 0.0029. Furthermore, our patchless approach performed better than patch- and whole image-based methods, improving test accuracy by 8% (91.41% vs. 99.34%), tested on the INbreast dataset. The proposed method is of significant importance in solving the need for a large training dataset as well as reducing the computational burden in training and implementing the mammography-based deep-learning models for early diagnosis of breast cancer.
尽管通过深度学习(DL)对乳腺钼靶肿块图像进行分类取得了巨大成就,但获取大量训练数据并确保使用强大且优化良好的算法在不同数据集上实现泛化仍然是一项挑战。基于ImageNet的迁移学习(TL)和补丁分类器已被用于应对这些挑战。然而,研究人员尚未能够使DL作为独立工具达到预期性能。在本研究中,我们提出了一种新颖的多阶段迁移学习方法,该方法基于ImageNet和癌细胞系图像预训练模型,用于将乳腺钼靶肿块分类为良性或恶性。我们在三个公共数据集上训练了我们的模型:乳腺钼靶筛查数字数据库(DDSM)、INbreast和乳腺影像分析学会(MIAS)。此外,还使用了这三个数据集的图像混合数据集来训练模型。对于DDSM、INbreast、MIAS和混合数据集,我们分别获得了平均五折交叉验证AUC为1、0.9994、0.9993和0.9998。此外,使用我们的方法相对于基于补丁的方法所观察到的性能提升具有统计学意义,p值为0.0029。此外,我们的无补丁方法比基于补丁和基于全图像的方法表现更好,在INbreast数据集上进行测试时,测试准确率提高了8%(91.41%对99.34%)。所提出的方法对于解决对大型训练数据集的需求以及减少训练和实施基于乳腺钼靶的深度学习模型以进行乳腺癌早期诊断时的计算负担具有重要意义。