Le Thuy T T, Yang Jiongxuan, Zhao Zimo, Zhang Kaidi, Li Wenjun, Hu Yan
University of Michigan School of Public Health, Department of Health Management and Policy, Ann Arbor, MI, USA.
University of Michigan School of Public Health, Department of Biostatistics, Ann Arbor, MI, USA.
medRxiv. 2025 Jun 20:2025.06.18.25329854. doi: 10.1101/2025.06.18.25329854.
The most effective way to reduce mortality and morbidity among current smokers is to quit smoking. Although about half of smokers attempted to quit, only one-tenth succeeded in 2022.
To identify key predictors of smoking cessation success to inform cessation interventions and increase quitting rates.
We analyzed data from waves 5 and 6 of the Population Assessment of Tobacco and Health (PATH) study (December 2018 to November 2021). Using OpenAI's GPT-4.1, we identified the top 45 variables from wave 5 that are highly predictive of 12-month smoking abstinence in wave 6, based on descriptions of survey variables. We then validated the predictive power of the GPT-4.1-selected variables by comparing the performance of eXtreme Gradient Boosting (XGBoost) trained on different sets of variables. Finally, we derived insights into the top 10 variables, ranked according to their SHapley Additive exPlanations values.
The performance of XGBoost trained with all possible wave 5 variables and the 45 selected variables was almost identical (AUC:0.749 vs AUC:0.752). The top 10 variables included past 30-day smoking frequency, minutes from waking up to smoking first cigarette, important people's views on tobacco use, prevalence of tobacco use among close associates, daily electronic nicotine product use, emotional dependence, and health harm concerns.
This study demonstrates the ability of OpenAI's GPT-4.1 to identify the top 45 PATH wave 5 variables associated with 12-month smoking abstinence using only their descriptions. This approach could help researchers design more effective survey questionnaires and improve efficiency of data collection.
降低当前吸烟者死亡率和发病率的最有效方法是戒烟。尽管约一半吸烟者尝试戒烟,但2022年只有十分之一的人成功戒烟。
确定戒烟成功的关键预测因素,为戒烟干预提供依据并提高戒烟率。
我们分析了烟草与健康人口评估(PATH)研究第5波和第6波(2018年12月至2021年11月)的数据。根据调查变量的描述,使用OpenAI的GPT-4.1从第5波中识别出45个对第6波中12个月戒烟具有高度预测性的变量。然后,通过比较在不同变量集上训练的极端梯度提升(XGBoost)的性能,验证GPT-4.1选择的变量的预测能力。最后,我们根据前10个变量的SHapley值对其进行了深入分析。
使用所有可能的第5波变量和45个选定变量训练的XGBoost性能几乎相同(AUC:0.749对AUC:0.752)。前10个变量包括过去30天的吸烟频率、醒来至吸第一支烟的分钟数、重要人物对烟草使用的看法、亲密伙伴中的烟草使用流行率、每日电子尼古丁产品使用情况、情感依赖以及对健康危害的担忧。
本研究证明了OpenAI的GPT-4.1仅根据描述就能识别与12个月戒烟相关的前45个PATH第5波变量的能力。这种方法可以帮助研究人员设计更有效的调查问卷并提高数据收集效率。