Surety 在评估 AI/CADx 中的作用：基于病灶的机器学习分类性能在乳腺 MRI 上的重复性。

Role of sureness in evaluating AI/CADx: Lesion-based repeatability of machine learning classification performance on breast MRI.

机构信息

Department of Radiology, The University of Chicago, Chicago, Illinois, USA.

Department of Physics, Wheaton College, Wheaton, Illinois, USA.

出版信息

Med Phys. 2024 Mar;51(3):1812-1821. doi: 10.1002/mp.16673. Epub 2023 Aug 21.

DOI:10.1002/mp.16673

PMID:37602841

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10879454/

Abstract

BACKGROUND

Artificial intelligence/computer-aided diagnosis (AI/CADx) and its use of radiomics have shown potential in diagnosis and prognosis of breast cancer. Performance metrics such as the area under the receiver operating characteristic (ROC) curve (AUC) are frequently used as figures of merit for the evaluation of CADx. Methods for evaluating lesion-based measures of performance may enhance the assessment of AI/CADx pipelines, particularly in the situation of comparing performances by classifier.

PURPOSE

The purpose of this study was to investigate the use case of two standard classifiers to (1) compare overall classification performance of the classifiers in the task of distinguishing between benign and malignant breast lesions using radiomic features extracted from dynamic contrast-enhanced magnetic resonance (DCE-MR) images, (2) define a new repeatability metric (termed sureness), and (3) use sureness to examine if one classifier provides an advantage in AI diagnostic performance by lesion when using radiomic features.

METHODS

Images of 1052 breast lesions (201 benign, 851 cancers) had been retrospectively collected under HIPAA/IRB compliance. The lesions had been segmented automatically using a fuzzy c-means method and thirty-two radiomic features had been extracted. Classification was investigated for the task of malignant lesions (81% of the dataset) versus benign lesions (19%). Two classifiers (linear discriminant analysis, LDA and support vector machines, SVM) were trained and tested within 0.632 bootstrap analyses (2000 iterations). Whole-set classification performance was evaluated at two levels: (1) the 0.632+ bias-corrected area under the ROC curve (AUC) and (2) performance metric curves which give variability in operating sensitivity and specificity at a target operating point (95% target sensitivity). Sureness was defined as 1-95% confidence interval of the classifier output for each lesion for each classifier. Lesion-based repeatability was evaluated at two levels: (1) repeatability profiles, which represent the distribution of sureness across the decision threshold and (2) sureness of each lesion. The latter was used to identify lesions with better sureness with one classifier over another while maintaining lesion-based performance across the bootstrap iterations.

RESULTS

In classification performance assessment, the median and 95% CI of difference in AUC between the two classifiers did not show evidence of difference (ΔAUC = -0.003 [-0.031, 0.018]). Both classifiers achieved the target sensitivity. Sureness was more consistent across the classifier output range for the SVM classifier than the LDA classifier. The SVM resulted in a net gain of 33 benign lesions and 307 cancers with higher sureness and maintained lesion-based performance. However, with the LDA there was a notable percentage of benign lesions (42%) with better sureness but lower lesion-based performance.

CONCLUSIONS

When there is no evidence for difference in performance between classifiers using AUC or other performance summary measures, a lesion-based sureness metric may provide additional insight into AI pipeline design. These findings present and emphasize the utility of lesion-based repeatability via sureness in AI/CADx as a complementary enhancement to other evaluation measures.

摘要

背景

人工智能/计算机辅助诊断（AI/CADx）及其对放射组学的应用在乳腺癌的诊断和预后中显示出了潜力。受试者工作特征（ROC）曲线下面积（AUC）等性能指标经常被用作 CADx 评估的度量标准。基于病变的性能评估方法可以增强 AI/CADx 管道的评估，尤其是在比较分类器性能的情况下。

目的

本研究的目的是调查两种标准分类器的使用情况：（1）使用从动态对比增强磁共振（DCE-MR）图像中提取的放射组学特征，区分良性和恶性乳腺病变任务中，比较分类器的总体分类性能；（2）定义一个新的可重复性度量标准（称为确信度）；（3）使用确信度检查在使用放射组学特征时，分类器是否通过病变为 AI 诊断性能提供优势。

方法

根据 HIPAA/IRB 法规，回顾性地收集了 1052 个乳腺病变（201 个良性，851 个癌症）的图像。使用模糊 C 均值方法自动对病变进行分割，并提取了 32 个放射组学特征。针对恶性病变（数据集的 81%）与良性病变（19%）的任务进行了分类研究。在 0.632 次引导分析（2000 次迭代）内，对两种分类器（线性判别分析，LDA 和支持向量机，SVM）进行了训练和测试。在两个水平上评估了全数据集的分类性能：（1）0.632+偏置校正的 ROC 曲线下面积（AUC）和（2）在目标操作点（95%目标灵敏度）处给出操作灵敏度和特异性变化的性能度量曲线。确信度被定义为每个病变每个分类器输出的 95%置信区间。在两个水平上评估了基于病变的可重复性：（1）重复性分布，代表置信度在决策阈值上的分布；（2）每个病变的置信度。后者用于在保持跨引导迭代的病变基础性能的同时，识别出一种分类器在另一种分类器上具有更好的置信度的病变。

结果

在分类性能评估中，两种分类器 AUC 之间差异的中位数和 95%CI 没有证据表明存在差异（AUC 差异= -0.003[-0.031, 0.018]）。两种分类器都达到了目标灵敏度。SVM 分类器的输出范围在置信度方面比 LDA 分类器更稳定。SVM 导致具有更高置信度且保持病变基础性能的良性病变增加 33 个，癌症增加 307 个。然而，使用 LDA，良性病变中有相当大的比例（42%）具有更好的置信度，但病变基础性能较低。

结论

当使用 AUC 或其他性能总结度量标准没有证据表明分类器之间存在性能差异时，基于病变的确信度指标可能会为 AI 管道设计提供更多的见解。这些发现提出并强调了基于病变的可重复性通过确信度在 AI/CADx 中的实用性，作为其他评估指标的补充增强。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ed2/10879454/f2acbec27ee6/nihms-1925583-f0001.jpg

相似文献

Role of sureness in evaluating AI/CADx: Lesion-based repeatability of machine learning classification performance on breast MRI.Surety 在评估 AI/CADx 中的作用：基于病灶的机器学习分类性能在乳腺 MRI 上的重复性。

Med Phys. 2024 Mar;51(3):1812-1821. doi: 10.1002/mp.16673. Epub 2023 Aug 21.

Repeatability in computer-aided diagnosis: application to breast cancer diagnosis on sonography.计算机辅助诊断的可重复性：在超声诊断乳腺癌中的应用。

Med Phys. 2010 Jun;37(6):2659-69. doi: 10.1118/1.3427409.

Independent validation of machine learning in diagnosing breast Cancer on magnetic resonance imaging within a single institution.在单家医疗机构内使用机器学习对乳腺磁共振成像进行独立验证诊断乳腺癌。

Cancer Imaging. 2019 Sep 18;19(1):64. doi: 10.1186/s40644-019-0252-2.

Characterization of spatiotemporal changes for the classification of dynamic contrast-enhanced magnetic-resonance breast lesions.动态对比增强磁共振乳腺病变分类的时空变化特征。

Artif Intell Med. 2013 Jun;58(2):101-14. doi: 10.1016/j.artmed.2013.03.002. Epub 2013 Mar 30.

Improving the Accuracy of Computer-aided Diagnosis for Breast MR Imaging by Differentiating between Mass and Nonmass Lesions.通过区分肿块和非肿块病变来提高乳腺磁共振成像计算机辅助诊断的准确性。

Radiology. 2016 Mar;278(3):679-88. doi: 10.1148/radiol.2015150241. Epub 2015 Sep 18.

Potential of computer-aided diagnosis of high spectral and spatial resolution (HiSS) MRI in the classification of breast lesions.高光谱和空间分辨率（HiSS）MRI 计算机辅助诊断在乳腺病变分类中的应用潜力。

J Magn Reson Imaging. 2014 Jan;39(1):59-67. doi: 10.1002/jmri.24145. Epub 2013 Sep 10.

Radiomic and Artificial Intelligence Analysis with Textural Metrics Extracted by Contrast-Enhanced Mammography and Dynamic Contrast Magnetic Resonance Imaging to Detect Breast Malignant Lesions.基于对比增强乳腺摄影和动态对比磁共振成像提取纹理特征的放射组学和人工智能分析在检测乳腺恶性病变中的应用。

Curr Oncol. 2022 Mar 13;29(3):1947-1966. doi: 10.3390/curroncol29030159.

A computer-aided diagnosis system for breast DCE-MRI at high spatiotemporal resolution.一种用于高时空分辨率乳腺动态对比增强磁共振成像的计算机辅助诊断系统。

Med Phys. 2016 Jan;43(1):84. doi: 10.1118/1.4937787.

Robustness of radiomic features of benign breast lesions and hormone receptor positive/HER2-negative cancers across DCE-MR magnet strengths.良性乳腺病变和激素受体阳性/HER2 阴性癌症的 DCE-MR 磁场强度下的放射组学特征的稳健性。

Magn Reson Imaging. 2021 Oct;82:111-121. doi: 10.1016/j.mri.2021.06.021. Epub 2021 Jun 24.

Classification of small lesions on dynamic breast MRI: Integrating dimension reduction and out-of-sample extension into CADx methodology.动态乳腺 MRI 中小病灶的分类：将降维和样本外扩展纳入 CADx 方法。

Artif Intell Med. 2014 Jan;60(1):65-77. doi: 10.1016/j.artmed.2013.11.003. Epub 2013 Nov 23.

引用本文的文献

Sureness of classification of breast cancers as pure ductal carcinoma or with invasive components on dynamic contrast-enhanced magnetic resonance imaging: application of likelihood assurance metrics for computer-aided diagnosis.动态对比增强磁共振成像中乳腺癌分类为纯导管癌或伴有浸润成分的确定性：用于计算机辅助诊断的似然性保证指标的应用

J Med Imaging (Bellingham). 2025 Nov;12(Suppl 2):S22012. doi: 10.1117/1.JMI.12.S2.S22012. Epub 2025 Jun 18.

Impact of retraining and data partitions on the generalizability of a deep learning model in the task of COVID-19 classification on chest radiographs.再训练和数据划分对胸部X光片上COVID-19分类任务中深度学习模型泛化能力的影响。

J Med Imaging (Bellingham). 2024 Nov;11(6):064503. doi: 10.1117/1.JMI.11.6.064503. Epub 2024 Dec 26.

本文引用的文献

Performance metric curve analysis framework to assess impact of the decision variable threshold, disease prevalence, and dataset variability in two-class classification.用于评估决策变量阈值、疾病患病率和数据集变异性在二分类中影响的性能指标曲线分析框架。

J Med Imaging (Bellingham). 2022 May;9(3):035502. doi: 10.1117/1.JMI.9.3.035502. Epub 2022 May 31.

Cancer Imaging. 2019 Sep 18;19(1):64. doi: 10.1186/s40644-019-0252-2.

Repeatability and reproducibility of MRI-based radiomic features in cervical cancer.基于 MRI 的宫颈癌放射组学特征的可重复性和可再现性。

Radiother Oncol. 2019 Jun;135:107-114. doi: 10.1016/j.radonc.2019.03.001. Epub 2019 Mar 19.

Repeatability of texture features derived from magnetic resonance and computed tomography imaging and use in predictive models for non-small cell lung cancer outcome.源自磁共振成像和计算机断层扫描成像的纹理特征的可重复性及其在非小细胞肺癌预后预测模型中的应用。

Phys Med Biol. 2019 Apr 12. doi: 10.1088/1361-6560/ab18d3.

Artificial intelligence in cancer imaging: Clinical challenges and applications.人工智能在癌症成像中的应用：临床挑战与应用

CA Cancer J Clin. 2019 Mar;69(2):127-157. doi: 10.3322/caac.21552. Epub 2019 Feb 5.

Repeatability and Reproducibility of Radiomic Features: A Systematic Review.重复性和可再现性的放射组学特征：系统评价。

Int J Radiat Oncol Biol Phys. 2018 Nov 15;102(4):1143-1158. doi: 10.1016/j.ijrobp.2018.05.053. Epub 2018 Jun 5.

Applications and limitations of radiomics.放射组学的应用与局限性。

Phys Med Biol. 2016 Jul 7;61(13):R150-66. doi: 10.1088/0031-9155/61/13/R150. Epub 2016 Jun 8.

Repeatability of Radiomic Features in Non-Small-Cell Lung Cancer [(18)F]FDG-PET/CT Studies: Impact of Reconstruction and Delineation.非小细胞肺癌[(18)F]FDG-PET/CT研究中影像组学特征的可重复性：重建与勾画的影响

Mol Imaging Biol. 2016 Oct;18(5):788-95. doi: 10.1007/s11307-016-0940-2.

Radiomics: Images Are More than Pictures, They Are Data.放射组学：图像不止是图片，它们是数据。

Radiology. 2016 Feb;278(2):563-77. doi: 10.1148/radiol.2015151169. Epub 2015 Nov 18.

Quantitative imaging biomarkers: a review of statistical methods for computer algorithm comparisons.定量成像生物标志物：计算机算法比较的统计方法综述

Stat Methods Med Res. 2015 Feb;24(1):68-106. doi: 10.1177/0962280214537390. Epub 2014 Jun 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验