Zhuang Luoting, Tabatabaei Seyed Mohammad Hossein, Salehi-Rad Ramin, Tran Linh M, Aberle Denise R, Prosper Ashley E, Hsu William
Medical & Imaging Informatics, Department of Radiological Sciences, David Geffen School of Medicine at UCLA, Los Angeles, 90095, CA, USA.
Department of Medicine, Division of Pulmonology and Critical Care, David Geffen School of Medicine at UCLA, Los Angeles, 90095, CA, USA.
ArXiv. 2025 Aug 8:arXiv:2504.21344v2.
Machine learning models have utilized semantic features, deep features, or both to assess lung nodule malignancy. However, their reliance on manual annotation during inference, limited interpretability, and sensitivity to imaging variations hinder their application in real-world clinical settings. Thus, this research aims to integrate semantic features derived from radiologists' assessments of nodules, guiding the model to learn clinically relevant, robust, and explainable imaging features for predicting lung cancer.
We obtained 938 low-dose CT scans from the National Lung Screening Trial (NLST) with 1,246 nodules and semantic features. Additionally, the Lung Image Database Consortium dataset contains 1,018 CT scans, with 2,625 lesions annotated for nodule characteristics. Three external datasets were obtained from UCLA Health, the LUNGx Challenge, and the Duke Lung Cancer Screening. For imaging input, we obtained 2D nodule slices from nine directions from 50 × 50 × 50 mm nodule crop. We converted structured semantic features into sentences using Gemini. We fine-tuned a pretrained Contrastive Language-Image Pretraining (CLIP) model with a parameter-efficient fine-tuning approach to align imaging and semantic text features and predict the one-year lung cancer diagnosis.
Our model outperformed state-of-the-art (SOTA) models in the NLST test set with an AUROC of 0.901 and AUPRC of 0.776. It also showed robust results in external datasets. Using CLIP, we also obtained predictions on semantic features through zero-shot inference, such as nodule margin (AUROC: 0.812), nodule consistency (0.812), and pleural attachment (0.840).
Our approach surpasses the SOTA models in predicting lung cancer across datasets collected from diverse clinical settings, providing explainable outputs, aiding clinicians in comprehending the underlying meaning of model predictions. This approach also prevents the model from learning shortcuts and generalizes across clinical settings. The code is available at https://github.com/luotingzhuang/CLIP_nodule.
机器学习模型已利用语义特征、深度特征或两者来评估肺结节的恶性程度。然而,它们在推理过程中依赖人工标注、可解释性有限以及对成像变化敏感,这阻碍了它们在实际临床环境中的应用。因此,本研究旨在整合从放射科医生对结节的评估中得出的语义特征,引导模型学习临床相关、稳健且可解释的成像特征以预测肺癌。
我们从国家肺癌筛查试验(NLST)中获得了938例低剂量CT扫描,其中有1246个结节和语义特征。此外,肺图像数据库联盟数据集包含1018例CT扫描,有2625个病变标注了结节特征。从加州大学洛杉矶分校健康系统、LUNGx挑战赛和杜克肺癌筛查中获得了三个外部数据集。对于成像输入,我们从50×50×50毫米的结节裁剪中从九个方向获取了二维结节切片。我们使用Gemini将结构化语义特征转换为句子。我们采用参数高效微调方法对预训练的对比语言-图像预训练(CLIP)模型进行微调,以对齐成像和语义文本特征并预测一年期肺癌诊断。
我们的模型在NLST测试集中的表现优于现有最佳(SOTA)模型,曲线下面积(AUROC)为0.901,精确率-召回率曲线下面积(AUPRC)为0.776。它在外部数据集中也显示出稳健的结果。使用CLIP,我们还通过零样本推理获得了对语义特征的预测,如结节边缘(AUROC:0.812)、结节一致性(0.812)和胸膜附着(0.840)。
我们的方法在跨不同临床环境收集的数据集上预测肺癌方面超越了SOTA模型,提供了可解释的输出,有助于临床医生理解模型预测的潜在含义。这种方法还可防止模型学习捷径并在不同临床环境中进行泛化。代码可在https://github.com/luotingzhuang/CLIP_nodule获取