Aguilar Carlos, Pacilè Serena, Weber Nicolas, Fillard Pierre
Therapixel, 06200 Nice, France.
Life (Basel). 2023 Feb 4;13(2):440. doi: 10.3390/life13020440.
We propose a methodology for monitoring an artificial intelligence (AI) tool for breast cancer screening when deployed in clinical centers. An AI trained to detect suspicious regions of interest in the four views of a mammogram and to characterize their level of suspicion with a score ranging from one (low suspicion) to ten (high suspicion of malignancy) was deployed in four radiological centers across the US. Results were collected between April 2021 and December 2022, resulting in a dataset of 36,581 AI records. To assess the behavior of the AI, its score distribution in each center was compared to a reference distribution obtained in silico using the Pearson correlation coefficient (PCC) between each center AI score distribution and the reference. The estimated PCCs were 0.998 [min: 0.993, max: 0.999] for center US-1, 0.975 [min: 0.923, max: 0.986] for US-2, 0.995 [min: 0.972, max: 0.998] for US-3 and 0.994 [min: 0.962, max: 0.982] for US-4. These values show that the AI behaved as expected. Low PCC values could be used to trigger an alert, which would facilitate the detection of software malfunctions. This methodology can help create new indicators to improve monitoring of software deployed in hospitals.
我们提出了一种方法,用于在临床中心部署人工智能(AI)乳腺癌筛查工具时对其进行监测。一个经过训练的AI被部署在美国的四个放射中心,该AI用于在乳房X光片的四个视图中检测可疑的感兴趣区域,并使用从1(低怀疑)到10(高度怀疑恶性)的分数来表征其怀疑程度。结果收集于2021年4月至2022年12月之间,形成了一个包含36,581条AI记录的数据集。为了评估AI的行为,将每个中心的分数分布与使用每个中心AI分数分布与参考分布之间的Pearson相关系数(PCC)在计算机模拟中获得的参考分布进行比较。美国1号中心的估计PCC为0.998[最小值:0.993,最大值:0.999],美国2号中心为0.975[最小值:0.923,最大值:0.986],美国3号中心为0.995[最小值:0.972,最大值:0.998],美国4号中心为0.994[最小值:0.962,最大值:0.982]。这些值表明AI的行为符合预期。低PCC值可用于触发警报,这将有助于检测软件故障。这种方法可以帮助创建新的指标,以改进对医院中部署的软件的监测。