Suppr超能文献

当ScaleNAS在大型多器官数据集上进行训练并在肝脏中进行验证时,病变检测和分割的可推广性。

Generalizability of lesion detection and segmentation when ScaleNAS is trained on a large multi-organ dataset and validated in the liver.

作者信息

Ma Jingchen, Yang Hao, Chou Yen, Yoon Jin, Allison Tavis, Komandur Ravikumar, McDunn Jon, Tasneem Asba, Do Richard K, Schwartz Lawrence H, Zhao Binsheng

机构信息

Department of Radiology, Columbia University Irving Medical Center, New York, New York, USA.

Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, New York, USA.

出版信息

Med Phys. 2025 Feb;52(2):1005-1018. doi: 10.1002/mp.17504. Epub 2024 Nov 22.

Abstract

BACKGROUND

Tumor assessment through imaging is crucial for diagnosing and treating cancer. Lesions in the liver, a common site for metastatic disease, are particularly challenging to accurately detect and segment. This labor-intensive task is subject to individual variation, which drives interest in automation using artificial intelligence (AI).

PURPOSE

Evaluate AI for lesion detection and lesion segmentation using CT in the context of human performance on the same task. Use internal testing to determine how an AI-developed model (ScaleNAS) trained on lesions in multiple organs performs when tested specifically on liver lesions in a dataset integrating real-world and clinical trial data. Use external testing to evaluate whether ScaleNAS's performance generalizes to publicly available colorectal liver metastases (CRLM) from The Cancer Imaging Archive (TCIA).

METHODS

The CUPA study dataset included patients whose CT scan of chest, abdomen, or pelvis at Columbia University between 2010-2020 indicated solid tumors (CUIMC, n = 5011) and from two clinical trials in metastatic colorectal cancer, PRIME (n = 1183) and Amgen (n = 463). Inclusion required ≥1 measurable lesion; exclusion criteria eliminated 1566 patients. Data were divided at the patient level into training (n = 3996), validation (n = 570), and testing (n = 1529) sets. To create the reference standard for training and validation, each case was annotated by one of six radiologists, randomly assigned, who marked the CUPA lesions without access to any previous annotations. For internal testing we refined the CUPA test set to contain only patients who had liver lesions (n = 525) and formed an enhanced reference standard through expert consensus reviewing prior annotations. For external testing, TCIA-CRLM (n = 197) formed the test set. The reference standard for TCIA-CRLM was formed by consensus review of the original annotation and contours by two new radiologists. Metrics for lesion detection were sensitivity and false positives. Lesion segmentation was assessed with median Dice coefficient, under-segmentation ratio (USR), and over-segmentation ratio (OSR). Subgroup analysis examined the influence of lesion size ≥ 10  mm (measurable by RECIST1.1) versus all lesions (important for early identification of disease progression).

RESULTS

ScaleNAS trained on all lesions achieved sensitivity of 71.4% and Dice of 70.2% for liver lesions in the CUPA internal test set (3,495 lesions) and sensitivity of 68.2% and Dice 64.2% in the TCIA-CRLM external test set (638 lesions). Human radiologists had mean sensitivity of 53.5% and Dice of 73.9% in CUPA and sensitivity of 84.1% and Dice of 88.4% in TCIA-CRLM. Performance improved for ScaleNAS and radiologists in the subgroup of lesions that excluded sub-centimeter lesions.

CONCLUSIONS

Our study presents the first evaluation of ScaleNAS in medical imaging, demonstrating its liver lesion detection and segmentation performance across diverse datasets. Using consensus reference standards from multiple radiologists, we addressed inter-observer variability and contributed to consistency in lesion annotation. While ScaleNAS does not surpass radiologists in performance, it offers fast and reliable results with potential utility in providing initial contours for radiologists. Future work will extend this model to lung and lymph node lesions, ultimately aiming to enhance clinical applications by generalizing detection and segmentation across tissue types.

摘要

背景

通过影像学进行肿瘤评估对于癌症的诊断和治疗至关重要。肝脏是转移性疾病的常见部位,其中的病变尤其难以准确检测和分割。这项劳动密集型任务存在个体差异,这激发了人们对使用人工智能(AI)实现自动化的兴趣。

目的

在人类执行相同任务的背景下,评估使用CT进行病变检测和病变分割的AI。利用内部测试来确定在多个器官的病变上训练的AI开发模型(ScaleNAS)在整合了真实世界和临床试验数据的数据集上专门针对肝脏病变进行测试时的表现。利用外部测试来评估ScaleNAS的性能是否能推广到来自癌症影像存档(TCIA)的公开可用的结直肠癌肝转移(CRLM)数据。

方法

CUPA研究数据集包括2010年至2020年在哥伦比亚大学进行胸部、腹部或骨盆CT扫描显示有实体瘤的患者(CUIMC,n = 5011)以及来自两项转移性结直肠癌临床试验PRIME(n = 1183)和安进(n = 463)的患者。纳入要求有≥1个可测量病变;排除标准排除了1566名患者。数据在患者层面被分为训练集(n = 3996)、验证集(n = 570)和测试集(n = 1529)。为创建训练和验证的参考标准,每个病例由六名放射科医生之一随机分配进行标注,他们在不参考任何先前标注的情况下标记CUPA病变。对于内部测试,我们对CUPA测试集进行了细化,使其仅包含有肝脏病变的患者(n = 525),并通过专家共识审查先前的标注形成了增强的参考标准。对于外部测试,TCIA - CRLM(n = 197)构成测试集。TCIA - CRLM的参考标准由两名新的放射科医生对原始标注和轮廓进行共识审查形成。病变检测的指标是灵敏度和假阳性。病变分割用中位Dice系数、欠分割率(USR)和过分割率(OSR)进行评估。亚组分析考察了病变大小≥10 mm(根据RECIST1.1可测量)与所有病变相比的影响(这对疾病进展的早期识别很重要)。

结果

在CUPA内部测试集(3495个病变)中,在所有病变上训练的ScaleNAS对肝脏病变实现了71.4%的灵敏度和70.2%的Dice系数,在TCIA - CRLM外部测试集(638个病变)中灵敏度为68.2%,Dice系数为64.2%。人类放射科医生在CUPA中的平均灵敏度为53.5%,Dice系数为73.9%,在TCIA - CRLM中的灵敏度为84.1%,Dice系数为88.4%。在排除亚厘米病变的病变亚组中,ScaleNAS和放射科医生的表现均有所改善。

结论

我们的研究首次对ScaleNAS在医学影像中的表现进行了评估,展示了其在不同数据集上的肝脏病变检测和分割性能。通过使用多名放射科医生的共识参考标准,我们解决了观察者间的变异性问题,并有助于病变标注的一致性。虽然ScaleNAS在性能上没有超过放射科医生,但它提供了快速可靠的结果,在为放射科医生提供初始轮廓方面具有潜在用途。未来的工作将把这个模型扩展到肺部和淋巴结病变,最终目标是通过跨组织类型的检测和分割推广来增强临床应用。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验