Dai Hao, Huang Yu, He Xing, Zhou Tiancheng, Liu Yuxi, Zhang Xuhong, Guo Yi, Guo Jingchuan, Bian Jiang
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL.
Department of Biostatistics & Health Data Science, Indiana University School of Medicine, Indianapolis, IN.
JCO Clin Cancer Inform. 2025 May;9:e2400291. doi: 10.1200/CCI-24-00291. Epub 2025 May 7.
Low-dose computed tomography (LDCT) screening is effective in reducing lung cancer mortality by detecting the disease at earlier, more treatable stages. However, high false-positive rates and the associated risks of subsequent invasive diagnostic procedures present significant challenges. This study proposes an advanced pipeline that integrates machine learning (ML) and causal inference techniques to optimize lung cancer screening decisions.
Using real-world data from the OneFlorida+ Clinical Research Consortium, we developed ML models to predict individual lung cancer risk and estimate the benefits of LDCT screening. Explainable artificial intelligence techniques were applied to identify key risk factors, ensuring transparency and trust in the model's predictions. Causal ML methods were used to estimate individualized treatment effects of LDCT screening, answering the critical what-if question regarding risk reduction from LDCT.
We defined a high-risk cohort of 5,947 patients who underwent LDCT, along with matched controls, to evaluate the framework. Our models demonstrated predictive performance with AUCs of 0.777 and 0.793 for 1-year and 3-year risk predictions, respectively. Causal modeling showed a consistent reduction in lung cancer risk across different subgroups due to LDCT. Specifically, the doubly robust model showed an average risk reduction of 9.5% for males and 12% for females. Age-stratified results indicated a 9.5% reduction for individuals age 50-60 years, a 7.5% reduction for those age 60-70 years, and the largest reduction of 15.1% for the 70-80 age group.
Integrating ML and causal inference into clinical workflows offers a robust tool for enhancing lung cancer screening. This pipeline provides accurate risk assessments and actionable insights tailored to individuals, empowering clinicians and patients to make informed screening decisions. The differential risk reduction across subgroups highlights the importance of personalized screening in improving outcomes for populations at risk of lung cancer.
低剂量计算机断层扫描(LDCT)筛查通过在更早、更易治疗的阶段检测疾病,在降低肺癌死亡率方面是有效的。然而,高假阳性率以及后续侵入性诊断程序的相关风险带来了重大挑战。本研究提出了一种先进的流程,该流程整合了机器学习(ML)和因果推理技术,以优化肺癌筛查决策。
利用来自OneFlorida+临床研究联盟的真实世界数据,我们开发了ML模型来预测个体肺癌风险并估计LDCT筛查的益处。应用可解释人工智能技术来识别关键风险因素,确保对模型预测的透明度和信任度。因果ML方法用于估计LDCT筛查的个体化治疗效果,回答关于LDCT降低风险的关键假设问题。
我们定义了一个由5947名接受LDCT的患者组成的高风险队列以及匹配的对照组,以评估该框架。我们的模型在1年和3年风险预测中的AUC分别为0.777和0.793,显示出预测性能。因果建模表明,由于LDCT,不同亚组的肺癌风险持续降低。具体而言,双稳健模型显示男性平均风险降低9.5%,女性降低12%。按年龄分层的结果表明,50 - 60岁个体风险降低9.5%,60 - 70岁个体降低7.5%,70 - 80岁年龄组降低幅度最大,为15.1%。
将ML和因果推理整合到临床工作流程中为加强肺癌筛查提供了一个强大的工具。该流程提供了准确的风险评估和针对个体的可操作见解,使临床医生和患者能够做出明智的筛查决策。各亚组风险降低的差异凸显了个性化筛查对于改善肺癌高危人群结局的重要性。