Blahová Linda, Kostolný Jozef, Cimrák Ivan
Faculty of Management Science and Informatics, University of Žilina, 010 26 Žilina, Slovakia.
Bioengineering (Basel). 2025 Feb 24;12(3):232. doi: 10.3390/bioengineering12030232.
Application of machine learning techniques in breast cancer detection has significantly advanced due to the availability of annotated mammography datasets. This paper provides a review of mammography studies using key datasets such as CBIS-DDSM, VinDr-Mammo, and CSAW-CC, which play a critical role in training classification and detection models. The analysis of the studies produces a set of data augmentation techniques in mammography, and their impact and performance improvements in detecting abnormalities in breast tissue are studied. The study discusses the challenges of dataset imbalances and presents methods to address this issue, like synthetic data generation and GAN augmentation as potential solutions. The work underscores the importance of dataset design dedicated for experiments, detailed annotations, and the usage of machine learning models and architectures in improving breast cancer screening models, with a focus on BI-RADS classification. Future directions include refining augmentation methods, addressing class imbalance, and enhancing model interpretability through tools like Grad-CAM.
由于有标注的乳腺X线摄影数据集的存在,机器学习技术在乳腺癌检测中的应用有了显著进展。本文综述了使用CBIS-DDSM、VinDr-Mammo和CSAW-CC等关键数据集的乳腺X线摄影研究,这些数据集在训练分类和检测模型中起着关键作用。对这些研究的分析得出了一组乳腺X线摄影中的数据增强技术,并研究了它们在检测乳腺组织异常方面的影响和性能提升。该研究讨论了数据集不平衡的挑战,并提出了解决这一问题的方法,如合成数据生成和GAN增强作为潜在解决方案。这项工作强调了专门用于实验的数据集设计、详细注释以及机器学习模型和架构在改进乳腺癌筛查模型中的重要性,重点是BI-RADS分类。未来的方向包括改进增强方法、解决类别不平衡问题以及通过Grad-CAM等工具增强模型的可解释性。