Bobowicz Maciej, Rygusik Marlena, Buler Jakub, Buler Rafał, Ferlin Maria, Kwasigroch Arkadiusz, Szurowska Edyta, Grochowski Michał
2nd Department of Radiology, Medical University of Gdansk, 80-214 Gdansk, Poland.
Department of Intelligent Control Systems and Decision Support, Faculty of Electrical and Control Engineering, Gdansk University of Technology, 80-233 Gdansk, Poland.
Cancers (Basel). 2023 May 10;15(10):2704. doi: 10.3390/cancers15102704.
Breast cancer is the most frequent female cancer, with a considerable disease burden and high mortality. Early diagnosis with screening mammography might be facilitated by automated systems supported by deep learning artificial intelligence. We propose a model based on a weakly supervised Clustering-constrained Attention Multiple Instance Learning (CLAM) classifier able to train under data scarcity effectively. We used a private dataset with 1174 non-cancer and 794 cancer images labelled at the image level with pathological ground truth confirmation. We used feature extractors (ResNet-18, ResNet-34, ResNet-50 and EfficientNet-B0) pre-trained on ImageNet. The best results were achieved with multimodal-view classification using both CC and MLO images simultaneously, resized by half, with a patch size of 224 px and an overlap of 0.25. It resulted in AUC-ROC = 0.896 ± 0.017, F1-score 81.8 ± 3.2, accuracy 81.6 ± 3.2, precision 82.4 ± 3.3, and recall 81.6 ± 3.2. Evaluation with the Chinese Mammography Database, with 5-fold cross-validation, patient-wise breakdowns, and transfer learning, resulted in AUC-ROC 0.848 ± 0.015, F1-score 78.6 ± 2.0, accuracy 78.4 ± 1.9, precision 78.8 ± 2.0, and recall 78.4 ± 1.9. The CLAM algorithm's attentional maps indicate the features most relevant to the algorithm in the images. Our approach was more effective than in many other studies, allowing for some explainability and identifying erroneous predictions based on the wrong premises.
乳腺癌是女性中最常见的癌症,具有相当大的疾病负担和高死亡率。深度学习人工智能支持的自动化系统可能有助于通过乳腺钼靶筛查进行早期诊断。我们提出了一种基于弱监督聚类约束注意力多实例学习(CLAM)分类器的模型,该模型能够在数据稀缺的情况下有效训练。我们使用了一个私有数据集,其中有1174张非癌症图像和794张癌症图像,这些图像在图像级别标注了病理真值确认。我们使用了在ImageNet上预训练的特征提取器(ResNet-18、ResNet-34、ResNet-50和EfficientNet-B0)。通过同时使用CC和MLO图像进行多模态视图分类,将图像大小减半,补丁大小为224像素,重叠率为0.25,取得了最佳结果。结果为AUC-ROC = 0.896 ± 0.017,F1分数81.8 ± 3.2,准确率81.6 ± 3.2,精确率82.4 ± 3.3,召回率81.6 ± 3.2。在中国乳腺钼靶数据库上进行评估,采用5折交叉验证、患者分层和迁移学习,结果为AUC-ROC 0.848 ± 0.015,F1分数78.6 ± 2.0,准确率78.4 ± 1.9,精确率78.8 ± 2.0,召回率78.4 ± 1.9。CLAM算法的注意力图显示了图像中与算法最相关的特征。我们的方法比许多其他研究更有效,具有一定的可解释性,并能基于错误前提识别错误预测。