Hovda Tone, Larsen Marthe, Bergan Marie Burns, Gjesvik Jonas, Akslen Lars A, Hofvind Solveig
Department of Radiology, Vestre Viken Hospital Trust, Drammen, Norway.
Section for Breast Cancer Screening, Cancer Registry of Norway, Norwegian Institute of Public Health, Oslo, Norway.
Eur Radiol. 2025 Mar 26. doi: 10.1007/s00330-025-11521-4.
To retrospectively evaluate the performance of a CE-marked AI system for identifying breast cancer on screening mammograms. Evidence from large retrospective studies is crucial for planning prospective studies and to further ensure safe implementation.
We used data from screening examinations performed from 2004 to 2021 at ten breast centers in BreastScreen Norway. In the standard independent double reading setting, each radiologist scored each breast from 1 (negative) to 5 (high probability of cancer). The AI system assigned each examination an NT and an SN score; the NT score aimed to classify examinations as negative with minimal misclassification while the SN score aimed to classify examinations as positive with high confidence. N70 was defined as being among the 70% with the lowest NT score and P3 was defined as being among the 3% with the highest SN score.
A total of 1,017,208 screening examinations were included in the study sample. At N70, 1.8% (107/5977) of the screen-detected and 34.5% (625/1812) of the interval cancers were defined as negative. Using P3 to define cases as positive, 81.5% (4871/5977) of the screen-detected and 19.0% (344/1812) of the interval cancers were defined as positive. Among the screen-detected cancers in N70, 11.2% (12/107) had an interpretation score > 2 by both radiologists.
The AI system performed well according to identifying negative cases and cancer cases. Thus, the AI system can be used to reduce workload for the radiologists and potentially increase the sensitivity of mammography.
Question Results from large mammography screening samples not used in training AI algorithms are important to consider when planning prospective studies and implementation. Findings More than 80% of the screening-detected cancers were classified as positive by AI when considering 3% of the examinations with the highest AI risk score as positive. Clinical relevance A lack of radiologists is a challenge in mammographic screening. Our findings support other studies that suggest the use of AI to reduce screen-reading workload.
回顾性评估一种获得CE标志的人工智能(AI)系统在乳腺钼靶筛查中识别乳腺癌的性能。来自大型回顾性研究的证据对于规划前瞻性研究以及进一步确保安全实施至关重要。
我们使用了2004年至2021年在挪威乳腺癌筛查中心的10个乳腺中心进行的筛查检查数据。在标准的独立双人阅片设置中,每位放射科医生将每个乳房从1分(阴性)到5分(癌症高概率)进行评分。AI系统为每次检查分配一个NT分数和一个SN分数;NT分数旨在将检查分类为阴性,同时将误分类降至最低,而SN分数旨在将检查高度置信地分类为阳性。N70被定义为NT分数最低的70%之中,P3被定义为SN分数最高的3%之中。
研究样本共纳入1,017,208次筛查检查。在N70时,筛查发现的癌症中1.8%(107/5977)以及间期癌中34.5%(625/1812)被定义为阴性。使用P3将病例定义为阳性时,筛查发现的癌症中81.5%(4871/5977)以及间期癌中19.0%(344/1812)被定义为阳性。在N70筛查发现的癌症中,11.2%(12/107)的解读分数在两位放射科医生处均>2分。
该AI系统在识别阴性病例和癌症病例方面表现良好。因此,该AI系统可用于减轻放射科医生的工作量,并有可能提高乳腺钼靶检查的敏感性。
问题 在规划前瞻性研究和实施时,未用于训练AI算法的大型乳腺钼靶筛查样本的结果很重要。发现 当将AI风险评分最高的3%检查视为阳性时,超过80%的筛查发现的癌症被AI分类为阳性。临床意义 放射科医生短缺是乳腺钼靶筛查中的一个挑战。我们的发现支持其他建议使用AI来减轻阅片工作量的研究。