Department of Health Systems Science, Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, California.
Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, California.
Am J Respir Crit Care Med. 2021 Aug 15;204(4):445-453. doi: 10.1164/rccm.202007-2791OC.
Most lung cancers are diagnosed at an advanced stage. Presymptomatic identification of high-risk individuals can prompt earlier intervention and improve long-term outcomes. To develop a model to predict a future diagnosis of lung cancer on the basis of routine clinical and laboratory data by using machine learning. We assembled data from 6,505 case patients with non-small cell lung cancer (NSCLC) and 189,597 contemporaneous control subjects and compared the accuracy of a novel machine learning model with a modified version of the well-validated 2012 Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial risk model (mPLCOm2012), by using the area under the receiver operating characteristic curve (AUC), sensitivity, and diagnostic odds ratio (OR) as measures of model performance. Among ever-smokers in the test set, a machine learning model was more accurate than the mPLCOm2012 for identifying NSCLC 9-12 months before clinical diagnosis ( < 0.00001) and demonstrated an AUC of 0.86, a diagnostic OR of 12.3, and a sensitivity of 40.1% at a predefined specificity of 95%. In comparison, the mPLCOm2012 demonstrated an AUC of 0.79, an OR of 7.4, and a sensitivity of 27.9% at the same specificity. The machine learning model was more accurate than standard eligibility criteria for lung cancer screening and more accurate than the mPLCOm2012 when applied to a screening-eligible population. Influential model variables included known risk factors and novel predictors such as white blood cell and platelet counts. A machine learning model was more accurate for early diagnosis of NSCLC than either standard eligibility criteria for screening or the mPLCOm2012, demonstrating the potential to help prevent lung cancer deaths through early detection.
大多数肺癌在晚期才被诊断出来。对高危人群进行无症状识别可以促使更早的干预,并改善长期预后。我们使用机器学习方法,基于常规临床和实验室数据来建立预测未来肺癌诊断的模型。我们汇集了 6505 例非小细胞肺癌(NSCLC)病例患者和 189597 例同期对照患者的数据,通过接受者操作特征曲线(ROC)下面积(AUC)、敏感性和诊断比值比(OR)来比较新型机器学习模型和经过改良的、经过充分验证的 2012 年前列腺癌、肺癌、结直肠癌和卵巢癌筛查试验风险模型(mPLCOm2012)的准确性,以评估模型性能。在测试集中的既往吸烟者中,与 mPLCOm2012 相比,机器学习模型在临床诊断前 9-12 个月识别 NSCLC 的准确性更高( < 0.00001),其 AUC 为 0.86,诊断 OR 为 12.3,特异性为 95%时的敏感性为 40.1%。相比之下,mPLCOm2012 的 AUC 为 0.79,OR 为 7.4,特异性为 95%时的敏感性为 27.9%。与标准的肺癌筛查入选标准相比,该机器学习模型更准确,与应用于筛查合格人群的 mPLCOm2012 相比,该模型更准确。有影响力的模型变量包括已知的危险因素和新的预测指标,如白细胞和血小板计数。与标准的筛查入选标准或 mPLCOm2012 相比,机器学习模型更能准确地预测早期 NSCLC,这表明通过早期发现有可能预防肺癌死亡。