Schurz Haiko, Solander Klara, Åström Davida, Cossío Fernando, Choi Taeyang, Dustler Magnus, Lundström Claes, Gustafsson Håkan, Zackrisson Sophia, Strand Fredrik
Department of Oncology-Pathology, Karolinska Institutet, Solna, Sweden.
Medical Diagnostics Karolinska, Karolinska University Hospital, Solna, Sweden.
NPJ Digit Med. 2025 May 8;8(1):259. doi: 10.1038/s41746-025-01623-0.
AI cancer detection models require calibration to attain the desired balance between cancer detection rate (CDR) and false positive rate. In this study, we simulate the impact of six types of mismatches between the calibration population and the clinical target population, by creating purposefully non-representative datasets to calibrate AI for clinical settings. Mismatching the acquisition year between healthy and cancer-diagnosed screening participants led to a distortion in CDR between -3% to +19%. Mismatching age led to a distortion in CDR between -0.2% to +27%. Mismatching breast density distribution led to a distortion in CDR between +1% to 16%. Mismatching mammography vendors lead to a distortion in CDR between -32% to + 33%. Mismatches between calibration population and target clinical population lead to clinically important deviations. It is vital for safe clinical AI integration to ensure that important aspects of the calibration population are representative of the target population.
人工智能癌症检测模型需要进行校准,以在癌症检测率(CDR)和假阳性率之间达到理想的平衡。在本研究中,我们通过创建特意不具代表性的数据集来校准用于临床环境的人工智能,模拟了校准人群与临床目标人群之间六种不匹配类型的影响。健康筛查参与者与癌症诊断筛查参与者之间的采集年份不匹配,导致癌症检测率出现-3%至+19%的偏差。年龄不匹配导致癌症检测率出现-0.2%至+27%的偏差。乳房密度分布不匹配导致癌症检测率出现+1%至16%的偏差。乳腺摄影设备供应商不匹配导致癌症检测率出现-32%至+33%的偏差。校准人群与目标临床人群之间的不匹配会导致具有临床重要性的偏差。为了安全地将人工智能集成到临床中,确保校准人群的重要方面能够代表目标人群至关重要。