Politecnico di Torino, Dipartimento di Automatica e Informatica, Corso Duca degli Abruzzi 24, 10129, Turin, Italy.
Department of Oncology-Pathology, Karolinska Institute, Stockholm, Sweden; Department of Breast Radiology, Karolinska University Hospital, Stockholm, Sweden.
Med Image Anal. 2025 Jan;99:103320. doi: 10.1016/j.media.2024.103320. Epub 2024 Sep 2.
The potential and promise of deep learning systems to provide an independent assessment and relieve radiologists' burden in screening mammography have been recognized in several studies. However, the low cancer prevalence, the need to process high-resolution images, and the need to combine information from multiple views and scales still pose technical challenges. Multi-view architectures that combine information from the four mammographic views to produce an exam-level classification score are a promising approach to the automated processing of screening mammography. However, training such architectures from exam-level labels, without relying on pixel-level supervision, requires very large datasets and may result in suboptimal accuracy. Emerging architectures such as Visual Transformers (ViT) and graph-based architectures can potentially integrate ipsi-lateral and contra-lateral breast views better than traditional convolutional neural networks, thanks to their stronger ability of modeling long-range dependencies. In this paper, we extensively evaluate novel transformer-based and graph-based architectures against state-of-the-art multi-view convolutional neural networks, trained in a weakly-supervised setting on a middle-scale dataset, both in terms of performance and interpretability. Extensive experiments on the CSAW dataset suggest that, while transformer-based architecture outperform other architectures, different inductive biases lead to complementary strengths and weaknesses, as each architecture is sensitive to different signs and mammographic features. Hence, an ensemble of different architectures should be preferred over a winner-takes-all approach to achieve more accurate and robust results. Overall, the findings highlight the potential of a wide range of multi-view architectures for breast cancer classification, even in datasets of relatively modest size, although the detection of small lesions remains challenging without pixel-wise supervision or ad-hoc networks.
深度学习系统在提供独立评估和减轻放射科医生在乳房 X 线摄影筛查中的负担方面的潜力和前景已在多项研究中得到认可。然而,低癌症患病率、需要处理高分辨率图像以及需要结合来自多个视图和尺度的信息仍然存在技术挑战。多视图架构,即将来自四个乳房 X 线视图的信息结合起来生成检查级分类评分,是一种有前途的自动化处理筛查乳房 X 线摄影的方法。然而,从检查级标签而不是像素级监督训练这种架构,需要非常大的数据集,并且可能导致精度不理想。新兴的架构,如视觉转换器 (ViT) 和基于图的架构,由于其更强的建模长程依赖关系的能力,有可能比传统的卷积神经网络更好地整合同侧和对侧乳房视图。在本文中,我们在一个中等规模的数据集上,在弱监督设置下,对新的基于转换器和基于图的架构进行了广泛评估,与最先进的多视图卷积神经网络进行了比较,无论是在性能还是可解释性方面。在 CSAW 数据集上的广泛实验表明,虽然基于转换器的架构优于其他架构,但不同的归纳偏差导致互补的优缺点,因为每个架构对不同的特征和乳房特征都很敏感。因此,与采用赢家通吃的方法相比,应该优先采用不同架构的集成来获得更准确和稳健的结果。总的来说,这些发现强调了多种多视图架构在乳腺癌分类方面的潜力,即使在相对较小的数据集上,尽管没有像素级监督或特定网络,检测小病变仍然具有挑战性。