Wang Qiaoli, Liang Tao, Li Yuexi, Liu Xiaoqin
Department of Health Screening Center, Deyang Peoples' Hospital, Deyang, Sichuan, 618000, People's Republic of China.
Department of Gastroenterology, Deyang Peoples' Hospital, Deyang, Sichuan, 618000, People's Republic of China.
Cancer Manag Res. 2024 May 30;16:527-535. doi: 10.2147/CMAR.S454638. eCollection 2024.
The aim of this study was to evaluate the potential benefit of blood inflammation in the diagnosis of non-small cell lung cancer (NSCLC) and propose a machine-learning-based method to predict NSCLC in asymptomatic adults.
A cross-sectional study was evaluated using medical records of 139 patients with non-small cell lung cancer and physical examination data from May 2022 to May 2023 of 198 healthy controls. The NSCLC cohort comprised 128 cases of adenocarcinoma, 3 cases of squamous cell carcinoma, and 8 cases of other NSCLC subtypes. The correlation between inflammatory and nutritional markers, such as monocytes, neutrophils, LMR, NLR, PLR, PHR and non-small cell lung cancer was examined. Features were selected using Python's feature selection library and analyzed by five algorithms. The predictive ability of the model for non-small cell lung cancer diagnosis was assessed by precision, accuracy, recall, F1 score, and area under the curve (AUC).
The results showed that the top 14 important factors were PDW, age, TP, RBC, HGB, LYM, LYM%, RDW, PLR, LMR, PHR, MONO, MONO%, gender. Additionally, the naive Bayes (NB) algorithm demonstrated the highest overall performance in predicting adult NSCLC among the five machine learning algorithms, achieving an accuracy of 0.87, a macro average F1 score of 0.85, a weighted average F1 score of 0.87, and an AUC of 0.84.
In feature ranking, platelet distribution width was the most important feature, and the NB algorithm performed best in predicting adult NSCLC diagnosis.
本研究旨在评估血液炎症指标在非小细胞肺癌(NSCLC)诊断中的潜在作用,并提出一种基于机器学习的方法来预测无症状成年人的NSCLC。
采用横断面研究,分析了139例非小细胞肺癌患者的病历以及198名健康对照者在2022年5月至2023年5月期间的体检数据。NSCLC队列包括128例腺癌、3例鳞状细胞癌和8例其他NSCLC亚型。研究了炎症和营养指标(如单核细胞、中性粒细胞、淋巴细胞与单核细胞比值(LMR)、中性粒细胞与淋巴细胞比值(NLR)、血小板与淋巴细胞比值(PLR)、血小板与血红蛋白比值(PHR))与非小细胞肺癌之间的相关性。使用Python的特征选择库选择特征,并通过五种算法进行分析。通过精确率、准确率、召回率、F1分数和曲线下面积(AUC)评估模型对非小细胞肺癌诊断的预测能力。
结果显示,前14个重要因素为血小板分布宽度(PDW)、年龄、总蛋白(TP)、红细胞(RBC)、血红蛋白(HGB)、淋巴细胞(LYM)、淋巴细胞百分比(LYM%)、红细胞分布宽度(RDW)、PLR、LMR、PHR、单核细胞(MONO)、单核细胞百分比(MONO%)、性别。此外,在五种机器学习算法中,朴素贝叶斯(NB)算法在预测成人NSCLC方面表现出最高的总体性能,准确率为0.87,宏平均F1分数为0.85,加权平均F1分数为0.87,AUC为0.84。
在特征排名中,血小板分布宽度是最重要的特征,NB算法在预测成人NSCLC诊断方面表现最佳。