Romanov Stepan, Howell Sacha, Harkness Elaine, Gareth Evans Dafydd, Astley Sue, Fergie Martin
University of Manchester, Manchester, United Kingdom.
The Christie NHS Foundation Trust, Manchester, United Kingdom.
J Med Imaging (Bellingham). 2025 Nov;12(Suppl 2):S22011. doi: 10.1117/1.JMI.12.S2.S22011. Epub 2025 Jun 12.
Breast density estimation is an important part of breast cancer risk assessment, as mammographic density is associated with risk. However, density assessed by multiple experts can be subject to high inter-observer variability, so automated methods are increasingly used. We investigate the inter-reader variability and risk prediction for expert assessors and a deep learning approach.
Screening data from a cohort of 1328 women, case-control matched, was used to compare between two expert readers and between a single reader and a deep learning model, Manchester artificial intelligence - visual analog scale (MAI-VAS). Bland-Altman analysis was used to assess the variability and matched concordance index to assess risk.
Although the mean differences for the two experiments were alike, the limits of agreement between MAI-VAS and a single reader are substantially lower at +SD (standard deviation) 21 (95% CI: 19.65, 21.69) -SD 22 (95% CI: , ) than between two expert readers +SD 31 (95% CI: 32.08, 29.23) -SD 29 (95% CI: , ). In addition, breast cancer risk discrimination for the deep learning method and density readings from a single expert was similar, with a matched concordance of 0.628 (95% CI: 0.598, 0.658) and 0.624 (95% CI: 0.595, 0.654), respectively. The automatic method had a similar inter-view agreement to experts and maintained consistency across density quartiles.
The artificial intelligence breast density assessment tool MAI-VAS has a better inter-observer agreement with a randomly selected expert reader than that between two expert readers. Deep learning-based density methods provide consistent density scores without compromising on breast cancer risk discrimination.
乳腺密度评估是乳腺癌风险评估的重要组成部分,因为乳腺钼靶密度与风险相关。然而,由多位专家评估的密度可能存在较高的观察者间变异性,因此自动化方法的使用越来越广泛。我们研究了专家评估者之间的读者间变异性和风险预测以及一种深度学习方法。
使用来自1328名病例对照匹配女性队列的筛查数据,比较两位专家读者之间以及一位读者与深度学习模型曼彻斯特人工智能视觉模拟量表(MAI-VAS)之间的差异。采用布兰德-奥特曼分析评估变异性,采用匹配一致性指数评估风险。
虽然两个实验的平均差异相似,但MAI-VAS与一位读者之间的一致性界限在+标准差(SD)21(95%CI:19.65,21.69)-标准差22(95%CI: , )时明显低于两位专家读者之间的+标准差31(95%CI:32.08,29.23)-标准差29(95%CI: , )。此外,深度学习方法和一位专家的密度读数对乳腺癌风险的判别相似,匹配一致性分别为0.628(95%CI:0.598,0.658)和0.624(95%CI:0.595,0.654)。自动方法与专家之间的访谈间一致性相似,并且在密度四分位数之间保持一致。
人工智能乳腺密度评估工具MAI-VAS与随机选择的专家读者之间的观察者间一致性优于两位专家读者之间的一致性。基于深度学习的密度方法提供了一致的密度评分,同时不影响对乳腺癌风险的判别。