Marinovich M Luke, Lotter William, Waddell Andrew, Houssami Nehmat
The Daffodil Centre, The University of Sydney, A Joint Venture With Cancer Council NSW, Sydney, NSW, Australia.
Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW, Australia.
J Med Screen. 2025 Mar;32(1):48-52. doi: 10.1177/09691413241262960. Epub 2024 Aug 11.
Artificial intelligence (AI) algorithms have been retrospectively evaluated as replacement for one radiologist in screening mammography double-reading; however, methods for resolving discordance between radiologists and AI in the absence of 'real-world' arbitration may underestimate cancer detection rate (CDR) and recall. In 108,970 consecutive screens from a population screening program (BreastScreen WA, Western Australia), 20,120 were radiologist/AI discordant without real-world arbitration. Recall probabilities were randomly assigned for these screens in 1000 simulations. Recall thresholds for screen-detected and interval cancers (sensitivity) and no cancer (false-positive proportion, FPP) were varied to calculate mean CDR and recall rate for the entire cohort. Assuming 100% sensitivity, the maximum CDR was 7.30 per 1000 screens. To achieve >95% probability that the mean CDR exceeded the screening program CDR (6.97 per 1000), interval cancer sensitivities ≥63% (at 100% screen-detected sensitivity) and ≥91% (at 80% screen-detected sensitivity) were required. Mean recall rate was relatively constant across sensitivity assumptions, but varied by FPP. FPP > 6.5% resulted in recall rates that exceeded the program estimate (3.38%). CDR improvements depend on a majority of interval cancers being detected in radiologist/AI discordant screens. Such improvements are likely to increase recall, requiring careful monitoring where AI is deployed for screen-reading.
人工智能(AI)算法已被进行回顾性评估,以替代一名放射科医生进行乳腺钼靶筛查的双人读片;然而,在缺乏“现实世界”仲裁的情况下,解决放射科医生与AI之间不一致的方法可能会低估癌症检出率(CDR)和召回率。在一项人群筛查项目(西澳大利亚州的BreastScreen WA)的108,970例连续筛查中,有20,120例在没有现实世界仲裁的情况下出现了放射科医生/AI不一致的情况。在1000次模拟中,为这些筛查随机分配召回概率。对筛查发现的癌症和间期癌(敏感性)以及无癌症(假阳性比例,FPP)的召回阈值进行变化,以计算整个队列的平均CDR和召回率。假设敏感性为100%,每1000次筛查的最大CDR为7.30。为了使平均CDR超过筛查项目的CDR(每1000次为6.97)的概率>95%,间期癌的敏感性需要≥63%(在筛查发现的癌症敏感性为100%时)和≥91%(在筛查发现的癌症敏感性为80%时)。在不同的敏感性假设下,平均召回率相对恒定,但会因FPP而有所不同。FPP>6.5%会导致召回率超过项目估计值(3.38%)。CDR的提高取决于在放射科医生/AI不一致的筛查中检测出大多数间期癌。这种提高可能会增加召回率,在将AI用于读片筛查时需要仔细监测。