Pant Sudarshan, Yang Hyung Jeong, Cho Sehyun, Ryu EuiJeong, Choi Ja Yun
Department of Artificial Intelligence Convergence, Chonnam National University, Gwangju, Republic of Korea.
College of Nursing, Chonnam National University, Gwangju, Republic of Korea.
Digit Health. 2025 Apr 15;11:20552076251333660. doi: 10.1177/20552076251333660. eCollection 2025 Jan-Dec.
This study aims to develop and validate a deep learning model to predict smoking status in patients with chronic obstructive pulmonary disease (COPD) using data from a national survey.
Data from the Korea National Health and Nutrition Examination Survey (2007-2018) were used to extract 5466 COPD-eligible cases. The data collection involved demographic, behavioral, and clinical variables, including 21 predictors such as age, sex, and pulmonary function test results. The dependent variable, smoking status, was categorized as smoker or nonsmoker. A residual neural network (ResNN) model was developed and compared with five machine learning algorithms (random forest, decision tree, Gaussian Naive Bayes, K-nearest neighbor, and AdaBoost) and two deep learning models (multilayer perceptron and TabNet). Internal validation was performed using five-fold cross-validation, and model performance was evaluated using the area under the receiver operating characteristic (AUROC) curve, sensitivity, specificity, and F1-score.
The ResNN achieved an AUROC, sensitivity, specificity, and F1-score of 0.73, 70.1%, 75.2%, and 0.67, respectively, outperforming previous machine learning and deep learning models in predicting smoking status in patients with COPD. Explainable artificial intelligence (Shapley additive explanations) identified key predictors, including sex, age, and perceived health status.
This deep learning model accurately predicts smoking status in patients with COPD, offering potential as a decision-support tool to detect high-risk persistent smokers for targeted interventions. Future studies should focus on external validation and incorporate additional behavioral and psychological variables to improve its generalizability and performance.
本研究旨在开发并验证一种深度学习模型,利用一项全国性调查的数据预测慢性阻塞性肺疾病(COPD)患者的吸烟状况。
使用韩国国民健康与营养检查调查(2007 - 2018年)的数据提取出5466例符合COPD标准的病例。数据收集涉及人口统计学、行为学和临床变量,包括年龄、性别和肺功能测试结果等21个预测因素。因变量吸烟状况分为吸烟者或非吸烟者。开发了一种残差神经网络(ResNN)模型,并将其与五种机器学习算法(随机森林、决策树、高斯朴素贝叶斯、K近邻和AdaBoost)以及两种深度学习模型(多层感知器和TabNet)进行比较。使用五折交叉验证进行内部验证,并使用受试者操作特征曲线下面积(AUROC)、敏感性、特异性和F1分数评估模型性能。
ResNN在预测COPD患者吸烟状况方面的AUROC、敏感性、特异性和F1分数分别达到0.73、70.1%、75.2%和0.67,优于先前的机器学习和深度学习模型。可解释人工智能(Shapley加法解释)确定了关键预测因素,包括性别、年龄和自我感知健康状况。
这种深度学习模型能够准确预测COPD患者的吸烟状况,具有作为决策支持工具来检测高危持续吸烟者以进行针对性干预的潜力。未来的研究应侧重于外部验证,并纳入更多行为和心理变量,以提高其通用性和性能。