Department of Health Management and Policy, School of Public Health, University of Michigan, Ann Arbor, MI, USA.
Department of Oncology, School of Medicine, Georgetown University, Washington, DC, USA.
Nicotine Tob Res. 2023 Jul 14;25(8):1481-1488. doi: 10.1093/ntr/ntad066.
Cigarette smoking continues to pose a threat to public health. Identifying individual risk factors for smoking initiation is essential to further mitigate this epidemic. To the best of our knowledge, no study today has used machine learning (ML) techniques to automatically uncover informative predictors of smoking onset among adults using the Population Assessment of Tobacco and Health (PATH) study.
In this work, we employed random forest paired with Recursive Feature Elimination to identify relevant PATH variables that predict smoking initiation among adults who have never smoked at baseline between two consecutive PATH waves. We included all potentially informative baseline variables in wave 1 (wave 4) to predict past 30-day smoking status in wave 2 (wave 5). Using the first and most recent pairs of PATH waves was found sufficient to identify the key risk factors of smoking initiation and test their robustness over time. The eXtreme Gradient Boosting method was employed to test the quality of these selected variables.
As a result, classification models suggested about 60 informative PATH variables among many candidate variables in each baseline wave. With these selected predictors, the resulting models have a high discriminatory power with the area under the specificity-sensitivity curves of around 80%. We examined the chosen variables and discovered important features. Across the considered waves, two factors, (1) BMI, and (2) dental and oral health status, robustly appeared as important predictors of smoking initiation, besides other well-established predictors.
Our work demonstrates that ML methods are useful to predict smoking initiation with high accuracy, identifying novel smoking initiation predictors, and to enhance our understanding of tobacco use behaviors.
Understanding individual risk factors for smoking initiation is essential to prevent smoking initiation. With this methodology, a set of the most informative predictors of smoking onset in the PATH data were identified. Besides reconfirming well-known risk factors, the findings suggested additional predictors of smoking initiation that have been overlooked in previous work. More studies that focus on the newly discovered factors (BMI and dental and oral health status,) are needed to confirm their predictive power against the onset of smoking as well as determine the underlying mechanisms.
吸烟仍然对公共健康构成威胁。确定吸烟起始的个体风险因素对于进一步减轻这一流行至关重要。据我们所知,目前尚无研究使用机器学习 (ML) 技术自动发现使用人口评估烟草和健康 (PATH) 研究的成年人吸烟起始的信息预测因子。
在这项工作中,我们采用随机森林与递归特征消除相结合的方法,在基线时从未吸烟过的成年人在两个连续的 PATH 波之间,识别出与吸烟起始相关的相关 PATH 变量。我们将波 1 中的所有潜在信息变量(波 4)纳入预测波 2 中过去 30 天吸烟状况(波 5)。发现使用第一和最近的 PATH 波足以识别吸烟起始的关键风险因素,并随着时间的推移测试其稳健性。极端梯度增强方法被用于测试这些选定变量的质量。
结果表明,在每个基线波的众多候选变量中,分类模型建议了大约 60 个信息性 PATH 变量。使用这些选定的预测因子,生成的模型具有较高的判别力,特异性-敏感性曲线下的面积约为 80%。我们检查了选定的变量,并发现了重要的特征。在考虑的波中,两个因素,(1)BMI,和(2)牙齿和口腔健康状况,除了其他已确立的预测因子外,还作为吸烟起始的重要预测因子稳定出现。
我们的工作表明,ML 方法可用于高度准确地预测吸烟起始,识别新的吸烟起始预测因子,并增强我们对烟草使用行为的理解。
了解吸烟起始的个体风险因素对于预防吸烟起始至关重要。使用这种方法,确定了 PATH 数据中吸烟发作的最具信息性预测因子集。除了重新确认众所周知的风险因素外,研究结果还提出了以前工作中被忽视的吸烟起始的其他预测因子。需要更多关注新发现因素(BMI 和牙齿和口腔健康状况)的研究来确认它们对吸烟起始的预测能力,并确定潜在的机制。