Oberije Cary J G, Currie Rachel, Leaver Alice, Redman Alan, Teh William, Sharma Nisha, Fox Georgia, Glocker Ben, Khara Galvin, Nash Jonathan, Ng Annie Y, Kecskemethy Peter D
Kheiron Medical Technologies Limited, London, Greater London, UK.
Breast Screening Programme, Royal Devon and Exeter NHS Foundation Trust, Exeter, Devon, UK.
BMJ Health Care Inform. 2025 May 14;32(1):e101318. doi: 10.1136/bmjhci-2024-101318.
Evaluate an Artificial Intelligence (AI) system in breast screening through stratified results across age, breast density, ethnicity and screening centres, from different UK regions.
A large-scale retrospective study evaluating two variations of using AI as an independent second reader in double reading was executed. Stratifications were conducted for clinical and operational metrics. Data from 306 839 mammography cases screened between 2017 and 2021 were used and included three different UK regions.The impact on safety and effectiveness was assessed using clinical metrics: cancer detection rate and positive predictive value, stratified according to age, breast density and ethnicity. Operational impact was assessed through reading workload and recall rate, measured overall and per centre.Non-inferiority was tested for AI workflows compared with human double reading, and when passed, superiority was tested. AI interval cancer (IC) flag rate was assessed to estimate additional cancer detection opportunity with AI that cannot be assessed retrospectively.
The AI workflows passed non-inferiority or superiority tests for every metric across all subgroups, with workload savings between 38.3% and 43.7%. The AI standalone flagged 41.2% of ICs overall, ranging between 33.3% and 46.8% across subgroups, with the highest detection rate for dense breasts.
Human double reading and AI workflows showed the same performance disparities across subgroups. The AI integrations maintained or improved performance at all metrics for all subgroups while achieving significant workload reduction. Moreover, complementing these integrations with AI as an additional reader can improve cancer detection.
The granularity of assessment showed that screening with the AI-system integrations was as safe as standard double reading across heterogeneous populations.
通过对来自英国不同地区的年龄、乳腺密度、种族和筛查中心进行分层分析,评估人工智能(AI)系统在乳腺筛查中的应用。
开展一项大规模回顾性研究,评估在双重阅片中使用AI作为独立第二阅片者的两种不同方式。对临床和操作指标进行分层。使用了2017年至2021年间筛查的306839例乳腺钼靶检查病例的数据,涵盖英国三个不同地区。使用临床指标评估对安全性和有效性的影响:癌症检出率和阳性预测值,按年龄、乳腺密度和种族分层。通过阅读工作量和召回率评估操作影响,整体及按中心进行测量。将AI工作流程与人工双重阅片进行非劣效性测试,通过测试后再进行优越性测试。评估AI间期癌(IC)标记率,以估计使用AI可检测到的额外癌症机会,而这些机会无法通过回顾性评估。
AI工作流程在所有亚组的每个指标上均通过了非劣效性或优越性测试,工作量节省了38.3%至43.7%。AI独立标记的IC总体占41.2%,各亚组之间在33.3%至46.8%之间,致密乳腺的检出率最高。
人工双重阅片和AI工作流程在各亚组中表现出相同的性能差异。AI整合在所有亚组的所有指标上保持或提高了性能,同时显著减少了工作量。此外,将AI作为额外阅片者补充这些整合可以提高癌症检测率。
评估的细致程度表明,在异质人群中,使用AI系统整合进行筛查与标准双重阅片一样安全。