Division of Diagnostic Radiology, Department of Translational Medicine, Lund University, Malmö, Sweden; Unilabs Mammography Unit, Skåne University Hospital, Malmö, Sweden.
Division of Diagnostic Radiology, Department of Translational Medicine, Lund University, Malmö, Sweden; Unilabs Mammography Unit, Skåne University Hospital, Malmö, Sweden.
Lancet Oncol. 2023 Aug;24(8):936-944. doi: 10.1016/S1470-2045(23)00298-X.
Retrospective studies have shown promising results using artificial intelligence (AI) to improve mammography screening accuracy and reduce screen-reading workload; however, to our knowledge, a randomised trial has not yet been conducted. We aimed to assess the clinical safety of an AI-supported screen-reading protocol compared with standard screen reading by radiologists following mammography.
In this randomised, controlled, population-based trial, women aged 40-80 years eligible for mammography screening (including general screening with 1·5-2-year intervals and annual screening for those with moderate hereditary risk of breast cancer or a history of breast cancer) at four screening sites in Sweden were informed about the study as part of the screening invitation. Those who did not opt out were randomly allocated (1:1) to AI-supported screening (intervention group) or standard double reading without AI (control group). Screening examinations were automatically randomised by the Picture Archive and Communications System with a pseudo-random number generator after image acquisition. The participants and the radiographers acquiring the screening examinations, but not the radiologists reading the screening examinations, were masked to study group allocation. The AI system (Transpara version 1.7.0) provided an examination-based malignancy risk score on a 10-level scale that was used to triage screening examinations to single reading (score 1-9) or double reading (score 10), with AI risk scores (for all examinations) and computer-aided detection marks (for examinations with risk score 8-10) available to the radiologists doing the screen reading. Here we report the prespecified clinical safety analysis, to be done after 80 000 women were enrolled, to assess the secondary outcome measures of early screening performance (cancer detection rate, recall rate, false positive rate, positive predictive value [PPV] of recall, and type of cancer detected [invasive or in situ]) and screen-reading workload. Analyses were done in the modified intention-to-treat population (ie, all women randomly assigned to a group with one complete screening examination, excluding women recalled due to enlarged lymph nodes diagnosed with lymphoma). The lowest acceptable limit for safety in the intervention group was a cancer detection rate of more than 3 per 1000 participants screened. The trial is registered with ClinicalTrials.gov, NCT04838756, and is closed to accrual; follow-up is ongoing to assess the primary endpoint of the trial, interval cancer rate.
Between April 12, 2021, and July 28, 2022, 80 033 women were randomly assigned to AI-supported screening (n=40 003) or double reading without AI (n=40 030). 13 women were excluded from the analysis. The median age was 54·0 years (IQR 46·7-63·9). Race and ethnicity data were not collected. AI-supported screening among 39 996 participants resulted in 244 screen-detected cancers, 861 recalls, and a total of 46 345 screen readings. Standard screening among 40 024 participants resulted in 203 screen-detected cancers, 817 recalls, and a total of 83 231 screen readings. Cancer detection rates were 6·1 (95% CI 5·4-6·9) per 1000 screened participants in the intervention group, above the lowest acceptable limit for safety, and 5·1 (4·4-5·8) per 1000 in the control group-a ratio of 1·2 (95% CI 1·0-1·5; p=0·052). Recall rates were 2·2% (95% CI 2·0-2·3) in the intervention group and 2·0% (1·9-2·2) in the control group. The false positive rate was 1·5% (95% CI 1·4-1·7) in both groups. The PPV of recall was 28·3% (95% CI 25·3-31·5) in the intervention group and 24·8% (21·9-28·0) in the control group. In the intervention group, 184 (75%) of 244 cancers detected were invasive and 60 (25%) were in situ; in the control group, 165 (81%) of 203 cancers were invasive and 38 (19%) were in situ. The screen-reading workload was reduced by 44·3% using AI.
AI-supported mammography screening resulted in a similar cancer detection rate compared with standard double reading, with a substantially lower screen-reading workload, indicating that the use of AI in mammography screening is safe. The trial was thus not halted and the primary endpoint of interval cancer rate will be assessed in 100 000 enrolled participants after 2-years of follow up.
Swedish Cancer Society, Confederation of Regional Cancer Centres, and the Swedish governmental funding for clinical research (ALF).
回顾性研究表明,人工智能(AI)在提高乳房 X 线摄影筛查准确性和减少阅片工作量方面具有良好效果;然而,据我们所知,尚未开展过随机试验。我们旨在评估 AI 支持的阅片方案与放射科医生常规阅片相比的临床安全性,研究对象为在瑞典四个筛查点进行乳房 X 线摄影筛查的 40-80 岁女性,包括一般筛查(间隔 1.5-2 年)和每年对有中度乳腺癌遗传风险或乳腺癌病史的女性进行筛查。这些女性在筛查邀请中了解到了该研究。未选择退出的女性将随机(1:1)分配至 AI 支持的筛查组(干预组)或标准双读组(对照组),无需 AI 支持。在图像采集后,通过图像存档和通信系统(Picture Archive and Communications System)使用伪随机数发生器自动随机分配筛查检查。参与者和获取筛查检查的放射技师(但不包括阅读筛查检查的放射科医生)对分组情况不知情。该 AI 系统(Transpara 版本 1.7.0)根据 10 级评分提供基于检查的恶性肿瘤风险评分,用于将筛查检查分诊至单读(评分 1-9)或双读(评分 10),AI 风险评分(所有检查)和计算机辅助检测标记(评分 8-10 的检查)可用于阅读筛查的放射科医生。此处报告了预先指定的临床安全性分析,在招募 80000 名女性后进行,以评估次要结局指标,包括早期筛查表现(癌症检出率、召回率、假阳性率、召回的阳性预测值[PPV]和检出的癌症类型[浸润性或原位])和阅片工作量。分析纳入的是在改良意向治疗人群中(即所有随机分配至一组且有完整筛查检查的女性,不包括因诊断为淋巴瘤而出现淋巴结肿大并被召回的女性)。干预组的最低可接受安全性下限为每 1000 名筛查参与者检出超过 3 例癌症。该试验在 ClinicalTrials.gov 上注册,编号为 NCT04838756,现已关闭入组;正在进行随访以评估试验的主要终点,即间期癌症发生率。
2021 年 4 月 12 日至 2022 年 7 月 28 日,共纳入 80033 名女性,随机分配至 AI 支持的筛查组(n=40003)或标准双读组(n=40030)。13 名女性被排除在分析之外。中位年龄为 54.0 岁(IQR 46.7-63.9)。未收集种族和民族数据。在 39996 名参与者中,AI 支持的筛查检出 244 例经筛检出的癌症、861 例召回,总计进行了 46345 次筛查阅读。在 40024 名参与者中,标准筛查检出 203 例经筛检出的癌症、817 例召回,总计进行了 83231 次筛查阅读。干预组的癌症检出率为每 1000 名筛查参与者 6.1(95%CI 5.4-6.9),高于最低可接受的安全性下限,对照组为每 1000 名筛查参与者 5.1(4.4-5.8),比值为 1.2(95%CI 1.0-1.5;p=0.052)。召回率在干预组为 2.2%(95%CI 2.0-2.3),在对照组为 2.0%(1.9-2.2)。假阳性率在两组均为 1.5%(95%CI 1.4-1.7)。召回的阳性预测值在干预组为 28.3%(95%CI 25.3-31.5),在对照组为 24.8%(21.9-28.0)。在干预组中,244 例经筛检出的癌症中 184 例(75%)为浸润性,60 例(25%)为原位癌;在对照组中,203 例经筛检出的癌症中 165 例(81%)为浸润性,38 例(19%)为原位癌。使用 AI 可将阅片工作量减少 44.3%。
与标准双读相比,AI 支持的乳房 X 线摄影筛查的癌症检出率相似,但阅片工作量大幅降低,这表明 AI 在乳房 X 线摄影筛查中的应用是安全的。因此,该试验并未停止,将在招募 100000 名参与者并进行 2 年随访后评估间期癌症发生率这一主要终点。
瑞典癌症协会、区域癌症中心联合会和瑞典政府为临床研究提供资金(ALF)。