Suppr超能文献

Surety 在评估 AI/CADx 中的作用:基于病灶的机器学习分类性能在乳腺 MRI 上的重复性。

Role of sureness in evaluating AI/CADx: Lesion-based repeatability of machine learning classification performance on breast MRI.

机构信息

Department of Radiology, The University of Chicago, Chicago, Illinois, USA.

Department of Physics, Wheaton College, Wheaton, Illinois, USA.

出版信息

Med Phys. 2024 Mar;51(3):1812-1821. doi: 10.1002/mp.16673. Epub 2023 Aug 21.

Abstract

BACKGROUND

Artificial intelligence/computer-aided diagnosis (AI/CADx) and its use of radiomics have shown potential in diagnosis and prognosis of breast cancer. Performance metrics such as the area under the receiver operating characteristic (ROC) curve (AUC) are frequently used as figures of merit for the evaluation of CADx. Methods for evaluating lesion-based measures of performance may enhance the assessment of AI/CADx pipelines, particularly in the situation of comparing performances by classifier.

PURPOSE

The purpose of this study was to investigate the use case of two standard classifiers to (1) compare overall classification performance of the classifiers in the task of distinguishing between benign and malignant breast lesions using radiomic features extracted from dynamic contrast-enhanced magnetic resonance (DCE-MR) images, (2) define a new repeatability metric (termed sureness), and (3) use sureness to examine if one classifier provides an advantage in AI diagnostic performance by lesion when using radiomic features.

METHODS

Images of 1052 breast lesions (201 benign, 851 cancers) had been retrospectively collected under HIPAA/IRB compliance. The lesions had been segmented automatically using a fuzzy c-means method and thirty-two radiomic features had been extracted. Classification was investigated for the task of malignant lesions (81% of the dataset) versus benign lesions (19%). Two classifiers (linear discriminant analysis, LDA and support vector machines, SVM) were trained and tested within 0.632 bootstrap analyses (2000 iterations). Whole-set classification performance was evaluated at two levels: (1) the 0.632+ bias-corrected area under the ROC curve (AUC) and (2) performance metric curves which give variability in operating sensitivity and specificity at a target operating point (95% target sensitivity). Sureness was defined as 1-95% confidence interval of the classifier output for each lesion for each classifier. Lesion-based repeatability was evaluated at two levels: (1) repeatability profiles, which represent the distribution of sureness across the decision threshold and (2) sureness of each lesion. The latter was used to identify lesions with better sureness with one classifier over another while maintaining lesion-based performance across the bootstrap iterations.

RESULTS

In classification performance assessment, the median and 95% CI of difference in AUC between the two classifiers did not show evidence of difference (ΔAUC = -0.003 [-0.031, 0.018]). Both classifiers achieved the target sensitivity. Sureness was more consistent across the classifier output range for the SVM classifier than the LDA classifier. The SVM resulted in a net gain of 33 benign lesions and 307 cancers with higher sureness and maintained lesion-based performance. However, with the LDA there was a notable percentage of benign lesions (42%) with better sureness but lower lesion-based performance.

CONCLUSIONS

When there is no evidence for difference in performance between classifiers using AUC or other performance summary measures, a lesion-based sureness metric may provide additional insight into AI pipeline design. These findings present and emphasize the utility of lesion-based repeatability via sureness in AI/CADx as a complementary enhancement to other evaluation measures.

摘要

背景

人工智能/计算机辅助诊断(AI/CADx)及其对放射组学的应用在乳腺癌的诊断和预后中显示出了潜力。受试者工作特征(ROC)曲线下面积(AUC)等性能指标经常被用作 CADx 评估的度量标准。基于病变的性能评估方法可以增强 AI/CADx 管道的评估,尤其是在比较分类器性能的情况下。

目的

本研究的目的是调查两种标准分类器的使用情况:(1)使用从动态对比增强磁共振(DCE-MR)图像中提取的放射组学特征,区分良性和恶性乳腺病变任务中,比较分类器的总体分类性能;(2)定义一个新的可重复性度量标准(称为确信度);(3)使用确信度检查在使用放射组学特征时,分类器是否通过病变为 AI 诊断性能提供优势。

方法

根据 HIPAA/IRB 法规,回顾性地收集了 1052 个乳腺病变(201 个良性,851 个癌症)的图像。使用模糊 C 均值方法自动对病变进行分割,并提取了 32 个放射组学特征。针对恶性病变(数据集的 81%)与良性病变(19%)的任务进行了分类研究。在 0.632 次引导分析(2000 次迭代)内,对两种分类器(线性判别分析,LDA 和支持向量机,SVM)进行了训练和测试。在两个水平上评估了全数据集的分类性能:(1)0.632+偏置校正的 ROC 曲线下面积(AUC)和(2)在目标操作点(95%目标灵敏度)处给出操作灵敏度和特异性变化的性能度量曲线。确信度被定义为每个病变每个分类器输出的 95%置信区间。在两个水平上评估了基于病变的可重复性:(1)重复性分布,代表置信度在决策阈值上的分布;(2)每个病变的置信度。后者用于在保持跨引导迭代的病变基础性能的同时,识别出一种分类器在另一种分类器上具有更好的置信度的病变。

结果

在分类性能评估中,两种分类器 AUC 之间差异的中位数和 95%CI 没有证据表明存在差异(AUC 差异= -0.003[-0.031, 0.018])。两种分类器都达到了目标灵敏度。SVM 分类器的输出范围在置信度方面比 LDA 分类器更稳定。SVM 导致具有更高置信度且保持病变基础性能的良性病变增加 33 个,癌症增加 307 个。然而,使用 LDA,良性病变中有相当大的比例(42%)具有更好的置信度,但病变基础性能较低。

结论

当使用 AUC 或其他性能总结度量标准没有证据表明分类器之间存在性能差异时,基于病变的确信度指标可能会为 AI 管道设计提供更多的见解。这些发现提出并强调了基于病变的可重复性通过确信度在 AI/CADx 中的实用性,作为其他评估指标的补充增强。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ed2/10879454/f2acbec27ee6/nihms-1925583-f0001.jpg

相似文献

本文引用的文献

6
Repeatability and Reproducibility of Radiomic Features: A Systematic Review.重复性和可再现性的放射组学特征:系统评价。
Int J Radiat Oncol Biol Phys. 2018 Nov 15;102(4):1143-1158. doi: 10.1016/j.ijrobp.2018.05.053. Epub 2018 Jun 5.
7
Applications and limitations of radiomics.放射组学的应用与局限性。
Phys Med Biol. 2016 Jul 7;61(13):R150-66. doi: 10.1088/0031-9155/61/13/R150. Epub 2016 Jun 8.
9
Radiomics: Images Are More than Pictures, They Are Data.放射组学:图像不止是图片,它们是数据。
Radiology. 2016 Feb;278(2):563-77. doi: 10.1148/radiol.2015151169. Epub 2015 Nov 18.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验