Khan Aparajita, Choi Eunji, Su Chloe, Graber-Naidich Anna, Henry Solomon, Satoyoshi Mina L, Bhat Archana, Kurian Allison W, Liang Su-Ying, Neal Joel, Gould Michael, Leung Ann, Wakelee Heather A, Backhus Leah M, Langlotz Curtis, Wu Julie, Han Summer S
Department of Computer Science and Engineering, Indian Institute of Technology (BHU) Varanasi, Varanasi, India.
Quantitative Sciences Unit, Department of Medicine, Stanford University School of Medicine, Stanford, CA.
JCO Clin Cancer Inform. 2025 Jul;9:e2400279. doi: 10.1200/CCI-24-00279. Epub 2025 Jul 23.
Despite its routine use to monitor patients with lung cancer (LC), real-world evaluations of the impact of computed tomography (CT) surveillance on overall survival (OS) have been inconsistent. A major confounder is the absence of imaging indications because patients undergo CT scans for purposes beyond surveillance, like symptom evaluations (eg, cough) linked to poor survival. We propose a novel natural language processing model to predict CT imaging indications (surveillance others).
We used electronic health records of 585 long-term LC survivors (≥5 years) at Stanford, followed for up to 22 years. Their 3,362 post-5-year CT reports (including 1,672 manually annotated) were used for modeling by integrating structured variables (eg, CT intervals) with key-phrase analysis of radiology reports. Naïve analysis compared OS in patients with CT for any indications (including symptoms) versus those without post-5-year CT, as in previous studies. Using model-predicted indications, we conducted exploratory analyses to compare OS between those with post-5-year surveillance CT and those without.
The model showed high discrimination (AUC, 0.86), with key predictors including a longer interval (≥6-month) from the previous CT (odds ratios [OR], 5.50; < .001) and surveillance-related key phrases (OR, 1.37; = .03). Propensity-adjusted survival analysis indicated better OS for patients with any post-5-year surveillance CT versus those without (adjusted hazard ratio, 0.60; = .016). By contrast, no significant survival difference was found ( = .53) between patients with any CT versus those without post-5-year CT.
Our model abstracted CT indications from real-world data with high discrimination. Exploratory analyses revealed the obscured imaging-OS association when considering indications, highlighting the model's potential for future real-world studies.
尽管计算机断层扫描(CT)监测在肺癌(LC)患者的日常监测中被广泛应用,但关于CT监测对总生存期(OS)影响的真实世界评估结果并不一致。一个主要的混杂因素是缺乏成像指征,因为患者进行CT扫描的目的不仅仅是监测,还包括与生存预后不良相关的症状评估(如咳嗽)等其他目的。我们提出了一种新颖的自然语言处理模型来预测CT成像指征(监测或其他)。
我们使用了斯坦福大学585名长期LC幸存者(≥5年)的电子健康记录,随访时间长达22年。他们的3362份5年后的CT报告(包括1672份人工标注报告)通过将结构化变量(如CT间隔时间)与放射学报告的关键词分析相结合来进行建模。与以往研究一样,简单分析比较了有任何指征(包括症状)进行CT检查的患者与5年后未进行CT检查的患者的OS。利用模型预测的指征,我们进行了探索性分析,以比较5年后接受监测CT检查的患者与未接受监测CT检查的患者的OS。
该模型显示出较高的区分度(AUC为0.86),关键预测因素包括与上一次CT检查的间隔时间较长(≥6个月)(优势比[OR]为5.50;P <.001)以及与监测相关的关键词(OR为1.37;P =.03)。倾向调整生存分析表明,5年后接受任何监测CT检查的患者的OS优于未接受监测CT检查的患者(调整后风险比为0.60;P =.016)。相比之下,有任何CT检查的患者与5年后未进行CT检查的患者之间未发现显著的生存差异(P =.53)。
我们的模型从真实世界数据中提取CT指征,具有较高的区分度。探索性分析揭示了在考虑指征时被掩盖的成像与OS之间的关联,突出了该模型在未来真实世界研究中的潜力。