基于视觉语言模型的语义引导成像生物标志物用于肺结节恶性预测

Vision-Language Model-Based Semantic-Guided Imaging Biomarker for Lung Nodule Malignancy Prediction.

作者信息

Zhuang Luoting, Tabatabaei Seyed Mohammad Hossein, Salehi-Rad Ramin, Tran Linh M, Aberle Denise R, Prosper Ashley E, Hsu William

机构信息

Medical & Imaging Informatics, Department of Radiological Sciences, David Geffen School of Medicine at UCLA, Los Angeles, 90095, CA, USA.

Department of Medicine, Division of Pulmonology and Critical Care, David Geffen School of Medicine at UCLA, Los Angeles, 90095, CA, USA.

出版信息

ArXiv. 2025 Aug 8:arXiv:2504.21344v2.

PMID:40799807

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12340779/

Abstract

OBJECTIVE

Machine learning models have utilized semantic features, deep features, or both to assess lung nodule malignancy. However, their reliance on manual annotation during inference, limited interpretability, and sensitivity to imaging variations hinder their application in real-world clinical settings. Thus, this research aims to integrate semantic features derived from radiologists' assessments of nodules, guiding the model to learn clinically relevant, robust, and explainable imaging features for predicting lung cancer.

METHODS

We obtained 938 low-dose CT scans from the National Lung Screening Trial (NLST) with 1,246 nodules and semantic features. Additionally, the Lung Image Database Consortium dataset contains 1,018 CT scans, with 2,625 lesions annotated for nodule characteristics. Three external datasets were obtained from UCLA Health, the LUNGx Challenge, and the Duke Lung Cancer Screening. For imaging input, we obtained 2D nodule slices from nine directions from 50 × 50 × 50 mm nodule crop. We converted structured semantic features into sentences using Gemini. We fine-tuned a pretrained Contrastive Language-Image Pretraining (CLIP) model with a parameter-efficient fine-tuning approach to align imaging and semantic text features and predict the one-year lung cancer diagnosis.

RESULTS

Our model outperformed state-of-the-art (SOTA) models in the NLST test set with an AUROC of 0.901 and AUPRC of 0.776. It also showed robust results in external datasets. Using CLIP, we also obtained predictions on semantic features through zero-shot inference, such as nodule margin (AUROC: 0.812), nodule consistency (0.812), and pleural attachment (0.840).

CONCLUSION

Our approach surpasses the SOTA models in predicting lung cancer across datasets collected from diverse clinical settings, providing explainable outputs, aiding clinicians in comprehending the underlying meaning of model predictions. This approach also prevents the model from learning shortcuts and generalizes across clinical settings. The code is available at https://github.com/luotingzhuang/CLIP_nodule.

摘要

目的

机器学习模型已利用语义特征、深度特征或两者来评估肺结节的恶性程度。然而，它们在推理过程中依赖人工标注、可解释性有限以及对成像变化敏感，这阻碍了它们在实际临床环境中的应用。因此，本研究旨在整合从放射科医生对结节的评估中得出的语义特征，引导模型学习临床相关、稳健且可解释的成像特征以预测肺癌。

方法

我们从国家肺癌筛查试验（NLST）中获得了938例低剂量CT扫描，其中有1246个结节和语义特征。此外，肺图像数据库联盟数据集包含1018例CT扫描，有2625个病变标注了结节特征。从加州大学洛杉矶分校健康系统、LUNGx挑战赛和杜克肺癌筛查中获得了三个外部数据集。对于成像输入，我们从50×50×50毫米的结节裁剪中从九个方向获取了二维结节切片。我们使用Gemini将结构化语义特征转换为句子。我们采用参数高效微调方法对预训练的对比语言-图像预训练（CLIP）模型进行微调，以对齐成像和语义文本特征并预测一年期肺癌诊断。

结果

我们的模型在NLST测试集中的表现优于现有最佳（SOTA）模型，曲线下面积（AUROC）为0.901，精确率-召回率曲线下面积（AUPRC）为0.776。它在外部数据集中也显示出稳健的结果。使用CLIP，我们还通过零样本推理获得了对语义特征的预测，如结节边缘（AUROC：0.812）、结节一致性（0.812）和胸膜附着（0.840）。

结论

我们的方法在跨不同临床环境收集的数据集上预测肺癌方面超越了SOTA模型，提供了可解释的输出，有助于临床医生理解模型预测的潜在含义。这种方法还可防止模型学习捷径并在不同临床环境中进行泛化。代码可在https://github.com/luotingzhuang/CLIP_nodule获取

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/173e/12340779/ada71bc8f63d/nihpp-2504.21344v2-f0004.jpg

相似文献

Vision-Language Model-Based Semantic-Guided Imaging Biomarker for Lung Nodule Malignancy Prediction.基于视觉语言模型的语义引导成像生物标志物用于肺结节恶性预测

ArXiv. 2025 Aug 8:arXiv:2504.21344v2.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

An open-source deep learning framework for respiratory motion monitoring and volumetric imaging during radiation therapy.一种用于放射治疗期间呼吸运动监测和容积成像的开源深度学习框架。

Med Phys. 2025 Jul;52(7):e18015. doi: 10.1002/mp.18015.

Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.利用晚期癌症患者腹部和骨盆 CT 图像建立卷积神经网络模型预测股骨近端病理性骨折的研究

Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.

Pulmonary nodule detection in low dose computed tomography using a medical-to-medical transfer learning approach.使用医学到医学迁移学习方法在低剂量计算机断层扫描中检测肺结节。

J Med Imaging (Bellingham). 2024 Jul;11(4):044502. doi: 10.1117/1.JMI.11.4.044502. Epub 2024 Jul 9.

Short-Term Memory Impairment短期记忆障碍

Evaluating the Reasoning Capabilities of Large Language Models for Medical Coding and Hospital Readmission Risk Stratification: Zero-Shot Prompting Approach.评估大型语言模型在医学编码和医院再入院风险分层方面的推理能力：零样本提示方法。

J Med Internet Res. 2025 Jul 30;27:e74142. doi: 10.2196/74142.

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果：一种针对特定个体见解的新型验证方法。

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

本文引用的文献

The Duke Lung Cancer Screening (DLCS) Dataset: A Reference Dataset of Annotated Low-Dose Screening Thoracic CT.杜克肺癌筛查（DLCS）数据集：带注释的低剂量胸部筛查CT参考数据集。

Radiol Artif Intell. 2025 Jul;7(4):e240248. doi: 10.1148/ryai.240248.

Exploring the Impact of Acquisition and Reconstruction Parameters on an Imaging-Based Lung Cancer Risk Model.探索采集与重建参数对基于成像的肺癌风险模型的影响。

Annu Int Conf IEEE Eng Med Biol Soc. 2024 Jul;2024:1-5. doi: 10.1109/EMBC53108.2024.10781833.

Vision-language foundation model for echocardiogram interpretation.用于超声心动图解释的视觉-语言基础模型。

Nat Med. 2024 May;30(5):1481-1488. doi: 10.1038/s41591-024-02959-y. Epub 2024 Apr 30.

A visual-language foundation model for pathology image analysis using medical Twitter.一种使用医学推特进行病理学图像分析的视觉语言基础模型。

Nat Med. 2023 Sep;29(9):2307-2316. doi: 10.1038/s41591-023-02504-3. Epub 2023 Aug 17.

Knowledge-enhanced visual-language pre-training on chest radiology images.基于胸部放射影像的知识增强视觉语言预训练。

Nat Commun. 2023 Jul 28;14(1):4542. doi: 10.1038/s41467-023-40260-7.

Sybil: A Validated Deep Learning Model to Predict Future Lung Cancer Risk From a Single Low-Dose Chest Computed Tomography.西比尔：一种从单次低剂量胸部 CT 预测未来肺癌风险的经过验证的深度学习模型。

J Clin Oncol. 2023 Apr 20;41(12):2191-2200. doi: 10.1200/JCO.22.01345. Epub 2023 Jan 12.

Reducing uncertainty in cancer risk estimation for patients with indeterminate pulmonary nodules using an integrated deep learning model.利用集成深度学习模型降低不确定度肺结节患者的癌症风险评估。

Comput Biol Med. 2022 Nov;150:106113. doi: 10.1016/j.compbiomed.2022.106113. Epub 2022 Sep 29.

TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning.TorchIO：一个用于在深度学习中高效加载、预处理、增强和基于补丁的医学图像采样的 Python 库。

Comput Methods Programs Biomed. 2021 Sep;208:106236. doi: 10.1016/j.cmpb.2021.106236. Epub 2021 Jun 17.

Deep Learning for Malignancy Risk Estimation of Pulmonary Nodules Detected at Low-Dose Screening CT.基于低剂量 CT 扫描检测到的肺部结节的恶性肿瘤风险估计的深度学习。

Radiology. 2021 Aug;300(2):438-447. doi: 10.1148/radiol.2021204433. Epub 2021 May 18.

Reproducibility of lung nodule radiomic features: Multivariable and univariable investigations that account for interactions between CT acquisition and reconstruction parameters.肺结节放射组学特征的可重复性：多变量和单变量研究，考虑了 CT 采集和重建参数之间的相互作用。

Med Phys. 2021 Jun;48(6):2906-2919. doi: 10.1002/mp.14830. Epub 2021 Apr 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于视觉语言模型的语义引导成像生物标志物用于肺结节恶性预测

Vision-Language Model-Based Semantic-Guided Imaging Biomarker for Lung Nodule Malignancy Prediction.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献