de Vries Clarisse F, Colosimo Samantha J, Staff Roger T, Dymiter Jaroslaw A, Yearsley Joseph, Dinneen Deirdre, Boyle Moragh, Harrison David J, Anderson Lesley A, Lip Gerald
From the Aberdeen Centre for Health Data Science, Institute of Applied Health Sciences (C.F.d.V., M.B., L.A.A.), School of Medicine, Medical Science and Nutrition (S.J.C., R.T.S.), and Grampian Data Safe Haven (DaSH), Aberdeen Centre for Health Data Science, Institute of Applied Health Sciences (J.A.D.), University of Aberdeen, Polwarth Building, Foresterhill, Aberdeen AB24 3FX, Scotland; National Health Service Grampian (NHSG), Aberdeen Royal Infirmary, Aberdeen, Scotland (S.J.C., R.T.S., G.L.); Kheiron Medical Technologies, London, England (J.Y., D.D.); and School of Medicine, University of St Andrews, St Andrews, Scotland (D.J.H.).
Radiol Artif Intell. 2023 Mar 22;5(3):e220146. doi: 10.1148/ryai.220146. eCollection 2023 May.
Artificial intelligence (AI) tools may assist breast screening mammography programs, but limited evidence supports their generalizability to new settings. This retrospective study used a 3-year dataset (April 1, 2016-March 31, 2019) from a U.K. regional screening program. The performance of a commercially available breast screening AI algorithm was assessed with a prespecified and site-specific decision threshold to evaluate whether its performance was transferable to a new clinical site. The dataset consisted of women (aged approximately 50-70 years) who attended routine screening, excluding self-referrals, those with complex physical requirements, those who had undergone a previous mastectomy, and those who underwent screening that had technical recalls or did not have the four standard image views. In total, 55 916 screening attendees (mean age, 60 years ± 6 [SD]) met the inclusion criteria. The prespecified threshold resulted in high recall rates (48.3%, 21 929 of 45 444), which reduced to 13.0% (5896 of 45 444) following threshold calibration, closer to the observed service level (5.0%, 2774 of 55 916). Recall rates also increased approximately threefold following a software upgrade on the mammography equipment, requiring per-software version thresholds. Using software-specific thresholds, the AI algorithm would have recalled 277 of 303 (91.4%) screen-detected cancers and 47 of 138 (34.1%) interval cancers. AI performance and thresholds should be validated for new clinical settings before deployment, while quality assurance systems should monitor AI performance for consistency. Breast, Screening, Mammography, Computer Applications-Detection/Diagnosis, Neoplasms-Primary, Technology Assessment © RSNA, 2023.
人工智能(AI)工具可能有助于乳腺筛查钼靶检查项目,但仅有有限的证据支持其在新环境中的通用性。这项回顾性研究使用了来自英国一个地区筛查项目的3年数据集(2016年4月1日至2019年3月31日)。使用预先设定的、针对特定地点的决策阈值评估了一种商用乳腺筛查AI算法的性能,以评估其性能是否可转移到新的临床地点。该数据集包括参加常规筛查的女性(年龄约50 - 70岁),不包括自我转诊者、有复杂身体需求者、之前接受过乳房切除术者以及接受筛查时出现技术召回或没有四个标准图像视图者。总共有55916名筛查参与者(平均年龄60岁±6[标准差])符合纳入标准。预先设定的阈值导致召回率较高(48.3%,45444例中有21929例),在阈值校准后降至13.0%(45444例中有5896例),更接近观察到的服务水平(5.0%,55916例中有2774例)。在乳腺摄影设备进行软件升级后,召回率也增加了约三倍,这需要针对每个软件版本设定阈值。使用特定于软件的阈值,AI算法将召回303例筛查发现的癌症中的277例(91.4%)和138例间期癌症中的47例(34.1%)。在部署之前,应针对新的临床环境验证AI性能和阈值,同时质量保证系统应监测AI性能的一致性。 乳腺、筛查、钼靶摄影、计算机应用 - 检测/诊断、肿瘤 - 原发性、技术评估 © RSNA,2023