Li Thomas Z, Xu Kaiwen, Krishnan Aravind, Gao Riqiang, Kammer Michael N, Antic Sanja, Xiao David, Knight Michael, Martinez Yency, Paez Rafael, Lentz Robert J, Deppen Stephen, Grogan Eric L, Lasko Thomas A, Sandler Kim L, Maldonado Fabien, Landman Bennett A
Medical Scientist Training Program, Vanderbilt University, Nashville, Tenn.
Department of Biomedical Engineering, Vanderbilt University, 2301 Vanderbilt Pl, Nashville, TN 37235.
Radiol Artif Intell. 2025 Mar;7(2):e230506. doi: 10.1148/ryai.230506.
Purpose To evaluate the performance of eight lung cancer prediction models on patient cohorts with screening-detected, incidentally detected, and bronchoscopically biopsied pulmonary nodules. Materials and Methods This study retrospectively evaluated promising predictive models for lung cancer prediction in three clinical settings: lung cancer screening with low-dose CT, incidentally detected pulmonary nodules, and nodules deemed suspicious enough to warrant a biopsy. The area under the receiver operating characteristic curve of eight validated models, including logistic regressions on clinical variables and radiologist nodule characterizations, artificial intelligence (AI) on chest CT scans, longitudinal imaging AI, and multimodal approaches for prediction of lung cancer risk was assessed in nine cohorts ( = 898, 896, 882, 219, 364, 117, 131, 115, 373) from multiple institutions. Each model was implemented from their published literature, and each cohort was curated from primary data sources collected over periods from 2002 to 2021. Results No single predictive model emerged as the highest-performing model across all cohorts, but certain models performed better in specific clinical contexts. Single-time-point chest CT AI performed well for screening-detected nodules but did not generalize well to other clinical settings. Longitudinal imaging and multimodal models demonstrated comparatively good performance on incidentally detected nodules. When applied to biopsied nodules, all models showed low performance. Conclusion Eight lung cancer prediction models failed to generalize well across clinical settings and sites outside of their training distributions. Diagnosis, Classification, Application Domain, Lung © RSNA, 2025 See also commentary by Shao and Niu in this issue.
目的 评估八种肺癌预测模型在筛查发现、偶然发现及经支气管镜活检的肺结节患者队列中的性能。材料与方法 本研究回顾性评估了在三种临床场景下用于肺癌预测的有前景的预测模型:低剂量CT肺癌筛查、偶然发现的肺结节以及被认为可疑到足以进行活检的结节。在来自多个机构的九个队列(n = 898、896、882、219、364、117、131、115、373)中,评估了八种验证模型的受试者操作特征曲线下面积,这些模型包括基于临床变量和放射科医生结节特征的逻辑回归、胸部CT扫描的人工智能(AI)、纵向成像AI以及预测肺癌风险的多模态方法。每个模型均根据其发表的文献进行实施,每个队列均从2002年至2021年期间收集的原始数据源中整理而来。结果 没有单一的预测模型在所有队列中表现为性能最佳的模型,但某些模型在特定临床背景下表现更好。单时间点胸部CT AI在筛查发现的结节方面表现良好,但在其他临床场景中的泛化能力不佳。纵向成像和多模态模型在偶然发现的结节方面表现出相对较好的性能。当应用于活检结节时,所有模型的性能都较低。结论 八种肺癌预测模型在其训练分布之外的临床场景和地点未能很好地泛化。诊断、分类、应用领域、肺 © RSNA,2025 另见本期邵和牛的评论。