Tian Fang, Lin Yongchun, Wang Liangjiao, Fang Fei, Hou Kaiwen
Department of Outpatient, Western Theater Command General Hospital of PLA, Chengdu, Sichuan, China.
Department of Emergency, Tibet Command General Hospital of PLA, Lhasa, China.
Front Med (Lausanne). 2025 Mar 4;11:1424750. doi: 10.3389/fmed.2024.1424750. eCollection 2024.
To assess the effectiveness of a feature self-recognition machine learning model in screening for pulmonary nodule risk in a physical examination population and to evaluate the constructed visualization system.
We analyzed data from 4,861 individuals who underwent chest CT exams during their physical examinations at the Western Theater General Hospital of the People's Liberation Army from January 2023 to November 2023. Among them, 1,168 had positive CT reports for pulmonary nodules, while 3,693 had negative findings. We developed a machine learning model using the XGBoost algorithm and employed an improved sooty tern optimization algorithm (ISTOA) for feature selection. The significance of the selected features was evaluated through univariate analysis and multivariable logistic stepwise regression analysis. A visualization system was created to estimate the risk of developing pulmonary nodules.
Multivariable analysis identified older age, smoking or passive smoking, high psychological stress within the past year, occupational exposure (e.g., air pollution at the workplace), presence of chronic lung diseases, and elevated carcinoembryonic antigen levels as significant risk factors for pulmonary nodules. The feature self-recognition machine learning model further highlighted age, smoking or passive smoking, high psychological stress, occupational exposure, chronic lung diseases, family history of lung cancer, decreased albumin levels, and elevated carcinoembryonic antigen as key predictors for early pulmonary nodule risk, demonstrating superior performance.
The feature self-recognition machine learning model effectively aids in the early prediction and clinical identification of pulmonary nodule risk, facilitating timely intervention and improving patient prognosis.
评估特征自识别机器学习模型在体检人群中筛查肺结节风险的有效性,并评估所构建的可视化系统。
我们分析了2023年1月至2023年11月期间在解放军西部战区总医院进行体检时接受胸部CT检查的4861名个体的数据。其中,1168人的CT报告显示肺结节呈阳性,而3693人的检查结果为阴性。我们使用XGBoost算法开发了一个机器学习模型,并采用改进的乌黑燕鸥优化算法(ISTOA)进行特征选择。通过单因素分析和多变量逻辑逐步回归分析评估所选特征的显著性。创建了一个可视化系统来估计患肺结节的风险。
多变量分析确定年龄较大、吸烟或被动吸烟、过去一年内心理压力大、职业暴露(如工作场所空气污染)、存在慢性肺部疾病以及癌胚抗原水平升高是肺结节的重要风险因素。特征自识别机器学习模型进一步突出了年龄、吸烟或被动吸烟、心理压力大、职业暴露、慢性肺部疾病、肺癌家族史、白蛋白水平降低和癌胚抗原升高是早期肺结节风险的关键预测因素,表现出卓越的性能。
特征自识别机器学习模型有效地辅助了肺结节风险的早期预测和临床识别,有助于及时干预并改善患者预后。