Department of Radiology, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul, Republic of Korea.
Medical Research Collaborating Center, Seoul National University Hospital, Seoul, Republic of Korea.
Acad Radiol. 2024 Jun;31(6):2239-2247. doi: 10.1016/j.acra.2023.12.006. Epub 2024 Jan 11.
Little is known about the factors affecting the Artificial Intelligence (AI) software performance on mammography for breast cancer detection. This study was to identify factors associated with abnormality scores assigned by the AI software.
A retrospective database search was conducted to identify consecutive asymptomatic women who underwent breast cancer surgery between April 2016 and December 2019. A commercially available AI software (Lunit INSIGHT, MMG, Ver. 1.1.4.0) was used for preoperative mammography to assign individual abnormality scores to the lesions and score of 10 or higher was considered as positive detection by AI software. Radiologists without knowledge of the AI results retrospectively assessed the mammographic density and classified mammographic findings into positive and negative finding. General linear model (GLM) analysis was used to identify the clinical, pathological, and mammographic findings related to the abnormality scores, obtaining coefficient β values that represent the mean difference per unit or comparison with the reference value. Additionally, the reasons for non-detection by the AI software were investigated.
Among the 1001 index cancers (830 invasive cancers and 171 ductal carcinoma in situs) in 1001 patients, 717 (72%) were correctly detected by AI, while the remaining 284 (28%) were not detected. Multivariable GLM analysis showed that abnormal mammography findings (β = 77.0 for mass, β = 73.1 for calcification only, β = 49.4 for architectural distortion, and β = 47.6 for asymmetry compared to negative; all Ps < 0.001), invasive tumor size (β = 4.3 per 1 cm, P < 0.001), and human epidermal growth receptor type 2 (HER2) positivity (β = 9.2 compared to hormone receptor positive, HER2 negative, P = 0.004) were associated with higher mean abnormality score. AI failed to detect small asymmetries in extremely dense breasts, subcentimeter-sized or isodense lesions, and faint amorphous calcifications.
Cancers with positive abnormal mammographic findings on retrospective review, large invasive size, HER2 positivity had high AI abnormality scores. Understanding the patterns of AI software performance is crucial for effectively integrating AI into clinical practice.
关于影响人工智能(AI)软件在乳腺癌检测中进行乳房 X 线摄影性能的因素知之甚少。本研究旨在确定与 AI 软件分配的异常评分相关的因素。
对 2016 年 4 月至 2019 年 12 月期间接受乳腺癌手术的连续无症状女性进行回顾性数据库检索。使用商业上可用的 AI 软件(Lunit INSIGHT,MMG,Ver.1.1.4.0)对术前乳房 X 线摄影进行评估,为病变分配个体异常评分,AI 软件检测评分 10 或更高被认为是阳性。放射科医生在不了解 AI 结果的情况下回顾性评估乳腺密度,并将乳腺 X 线摄影表现分类为阳性和阴性表现。使用一般线性模型(GLM)分析确定与异常评分相关的临床、病理和乳腺 X 线摄影表现的系数β值,该值代表每个单位的平均差异或与参考值的比较。此外,还研究了 AI 软件未检测到的原因。
在 1001 名患者的 1001 个指数癌(830 个浸润性癌和 171 个导管原位癌)中,717 个(72%)被 AI 正确检测到,而其余 284 个(28%)未被检测到。多变量 GLM 分析显示,异常乳腺 X 线摄影表现(肿块为 77.0[β],仅钙化为 73.1[β],结构扭曲为 49.4[β],不对称为 47.6[β],与阴性相比;均 P<0.001)、浸润性肿瘤大小(每 1cm 为 4.3[β],P<0.001)和人表皮生长因子受体 2 型(HER2)阳性(与激素受体阳性、HER2 阴性相比为 9.2[β],P=0.004)与较高的平均异常评分相关。AI 未能检测到极致密乳房中的小不对称、亚厘米大小或等密度病变以及微弱的无定形钙化。
回顾性审查中具有阳性异常乳腺 X 线摄影表现、大浸润性大小、HER2 阳性的癌症具有较高的 AI 异常评分。了解 AI 软件性能模式对于有效将 AI 整合到临床实践中至关重要。