Mitra Jhimli, Ghose Soumya, Thawani Rajat
GE HealthCare, Niskayuna, NY 12309, USA.
Division of Hematology and Oncology, Knight Cancer Institute, Oregon Health & Science University (OHSU), Portland, OR 97239, USA.
Cancers (Basel). 2025 Aug 18;17(16):2679. doi: 10.3390/cancers17162679.
BACKGROUND/OBJECTIVES: Immunotherapy is a viable therapeutic approach for non-small cell lung cancer (NSCLC). Despite the significant survival benefit of immune checkpoint inhibitors PD-1/PD-L1, on average; the objective response rate is around 20% as monotherapy and around 50% in combination with chemotherapy. While PD-L1 IHC is used as a predictive biomarker, its accuracy is subpar.
In this work, we develop a machine learning (ML) method to predict response to immunotherapy in NSCLC from multimodal clinicopathological biomarkers, tumor and peritumoral radiomic biomarkers from CT images. We further learn a graph structure to understand the associations between biomarkers and treatment response. The graph is then used to create sentences with clinical hypotheses that are finally used in a Large Language Model (LLM) that explains the treatment response predicated on the biomarkers that are comprehensible to clinicians. From a retrospective study, a training dataset of NSCLC with n = 248 tumors from 140 subjects was used for feature selection, ML model training, learning the graph structure, and fine-tuning LLM.
An AUC = 0.83 was achieved for prediction of treatment response on a separate test dataset of n = 84 tumors from 47 subjects.
Our study therefore not only improves the prediction of immunotherapy response in patients with NSCLC from multimodal data but also assists the clinicians in making clinically interpretable predictions by providing language-based explanations.
背景/目的:免疫疗法是治疗非小细胞肺癌(NSCLC)的一种可行方法。尽管免疫检查点抑制剂PD-1/PD-L1平均能带来显著的生存获益,但作为单一疗法时客观缓解率约为20%,与化疗联合使用时约为50%。虽然PD-L1免疫组化被用作预测生物标志物,但其准确性欠佳。
在本研究中,我们开发了一种机器学习(ML)方法,从多模态临床病理生物标志物、CT图像中的肿瘤及瘤周放射组学生物标志物预测NSCLC患者对免疫疗法的反应。我们进一步学习一种图结构以了解生物标志物与治疗反应之间的关联。然后利用该图创建带有临床假设的句子,这些句子最终用于一个大语言模型(LLM),该模型根据临床医生可理解的生物标志物解释治疗反应。通过一项回顾性研究,使用来自140名受试者的n = 248个肿瘤的NSCLC训练数据集进行特征选择、ML模型训练、学习图结构以及微调LLM。
在一个来自47名受试者的n = 84个肿瘤的独立测试数据集上,预测治疗反应的AUC = 0.83。
因此,我们的研究不仅通过多模态数据改善了NSCLC患者免疫疗法反应的预测,还通过提供基于语言的解释帮助临床医生做出可临床解释的预测。