Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
Ann Surg Oncol. 2024 Nov;31(12):7738-7749. doi: 10.1245/s10434-024-15762-3. Epub 2024 Jul 16.
Lung cancer poses a global health threat necessitating early detection and precise staging for improved patient outcomes. This study focuses on developing and validating a machine learning-based risk model for early lung cancer screening and staging, using routine clinical data.
Two medical center, observational, retrospective studies were conducted, involving 2312 lung cancer patients and 653 patients with benign nodules. Machine learning techniques, including differential analysis and feature selection, were employed to identify key factors for modeling. The study focused on variables such as nodule density, carcinoembryonic antigen (CEA), age, and lifestyle habits. The Logistic Regression model was utilized for early diagnoses, and the XGBoost model was utilized for staging based on selected features.
For early diagnoses, the Logistic Regression model achieved an area under the curve (AUC) of 0.716 (95% confidence interval [CI] 0.607-0.826), with 0.703 sensitivity and 0.654 specificity. The XGBoost model excelled in distinguishing late-stage from early-stage lung cancer, exhibiting an AUC of 0.913 (95% CI 0.862-0.963), with 0.909 sensitivity and 0.814 specificity. These findings highlight the model's potential for enhancing diagnostic accuracy and staging in lung cancer.
This study introduces a novel machine learning-based risk model for early lung cancer screening and staging, leveraging routine clinical information and laboratory data. The model shows promise in enhancing accuracy, mitigating overdiagnosis, and improving patient outcomes.
肺癌对全球健康构成威胁,需要早期发现和精确分期,以改善患者预后。本研究旨在利用常规临床数据开发和验证一种基于机器学习的早期肺癌筛查和分期风险模型。
进行了两项医学中心、观察性、回顾性研究,共纳入 2312 例肺癌患者和 653 例良性结节患者。采用差异分析和特征选择等机器学习技术来识别建模的关键因素。研究重点关注了结节密度、癌胚抗原(CEA)、年龄和生活方式习惯等变量。使用 Logistic 回归模型进行早期诊断,使用 XGBoost 模型基于选定特征进行分期。
对于早期诊断,Logistic 回归模型的曲线下面积(AUC)为 0.716(95%置信区间 [CI] 0.607-0.826),灵敏度为 0.703,特异性为 0.654。XGBoost 模型在区分晚期和早期肺癌方面表现出色,AUC 为 0.913(95% CI 0.862-0.963),灵敏度为 0.909,特异性为 0.814。这些发现突显了该模型在提高肺癌诊断准确性和分期方面的潜力。
本研究提出了一种基于机器学习的早期肺癌筛查和分期的新型风险模型,利用常规临床信息和实验室数据。该模型有望提高准确性、减少过度诊断,并改善患者预后。