Han Taesun, Yun Hyesun, Sur Young Keun, Park Heeboong
Department of Radiology, Park Heeboong Surgical Clinic, 203, Geowerk, 7, Hyowon-ro, 256beon-gil, Gwonseon-gu, Suwon-si 16571, Gyeonggi-do, Republic of Korea.
Department of Family Medicine, Park Heeboong Surgical Clinic, 203, Geowerk, 7, Hyowon-ro, 256beon-gil, Gwonseon-gu, Suwon-si 16571, Gyeonggi-do, Republic of Korea.
Diagnostics (Basel). 2025 May 29;15(11):1368. doi: 10.3390/diagnostics15111368.
Artificial intelligence (AI)-based systems are increasingly being used to assist radiologists in detecting breast cancer on mammograms. However, applying fixed AI score thresholds across diverse lesion types may compromise diagnostic performance, especially in women with dense breasts. This study aimed to determine optimal category-specific AI thresholds and to analyze discrepancies between AI predictions and radiologist assessments, particularly for BI-RADS 4A versus 4B/4C lesions. : We retrospectively analyzed 194 mammograms (76 BI-RADS 4A and 118 BI-RADS 4B/4C) using FDA-approved AI software. Lesion characteristics, breast density, AI scores, and pathology results were collected. A receiver operating characteristic (ROC) analysis was conducted to determine the optimal thresholds via Youden's index. Discrepancy analysis focused on BI-RADS 4A lesions with AI scores of ≥35 and BI-RADS 4B/4C lesions with AI scores of <35. : AI scores were significantly higher in malignant versus benign cases (72.1 vs. 20.9; < 0.001). The optimal AI threshold was 19 for BI-RADS 4A (AUC = 0.685) and 63 for BI-RADS 4B/4C (AUC = 0.908). In discordant cases, BI-RADS 4A lesions with scores of ≥35 had a malignancy rate of 43.8%, while BI-RADS 4B/4C lesions with scores of <35 had a malignancy rate of 19.5%. : Using category-specific AI thresholds improves diagnostic accuracy and supports radiologist decision-making. However, limitations persist in BI-RADS 4A cases with overlapping scores, reinforcing the need for radiologist oversight and tailored AI integration strategies in clinical practice.
基于人工智能(AI)的系统越来越多地被用于辅助放射科医生在乳房X光检查中检测乳腺癌。然而,在不同病变类型中应用固定的AI评分阈值可能会损害诊断性能,尤其是在乳房致密的女性中。本研究旨在确定最佳的特定类别AI阈值,并分析AI预测与放射科医生评估之间的差异,特别是对于BI-RADS 4A与4B/4C病变。:我们使用FDA批准的AI软件对194份乳房X光片(76份BI-RADS 4A和118份BI-RADS 4B/4C)进行了回顾性分析。收集了病变特征、乳房密度、AI评分和病理结果。通过约登指数进行受试者操作特征(ROC)分析以确定最佳阈值。差异分析集中于AI评分≥35的BI-RADS 4A病变和AI评分<35的BI-RADS 4B/4C病变。:恶性病例的AI评分显著高于良性病例(72.1对20.9;<0.001)。BI-RADS 4A的最佳AI阈值为19(AUC = 0.685),BI-RADS 4B/4C的最佳AI阈值为63(AUC = 0.908)。在不一致的病例中,评分≥35的BI-RADS 4A病变的恶性率为43.8%,而评分<35的BI-RADS 4B/4C病变的恶性率为19.5%。:使用特定类别的AI阈值可提高诊断准确性并支持放射科医生的决策。然而,在评分重叠的BI-RADS 4A病例中仍然存在局限性,这加强了在临床实践中放射科医生监督和定制AI整合策略的必要性。