Roohafza Hamidreza, Mousavi Elahe, Omidi Razieh, Sadeghi Masoumeh, Sehhati Mohammadreza, Vaez Ahmad
Isfahan Cardiovascular Research Centre, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran.
Department of Bioelectrics and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
Int J Prev Med. 2025 Apr 24;16:27. doi: 10.4103/ijpvm.ijpvm_306_23. eCollection 2025.
Considering the increasing prevalence of adolescent smoking in recent years, this study proposes a machine learning (ML) approach for distinguishing adolescents who are prone to start smoking and those who do not directly confess to smoking.
We used two repeated measures cross-sectional studies, including data from 7940 individuals as distinct training and test datasets. Utilizing the randomized least absolute shrinkage and selector operator (LASSO), the most influential factors were selected. We then investigated the performance of different ML approaches for the automatic classification of students into smoker/nonsmoker and low-risk/high-risk categories.
Randomized LASSO feature selection prioritized 15 factors, including peer influence, risky behaviors, attitude and school policy toward smoking, family factors, depression, and sex as the most influential factors in smoking. Applying different ML approaches to the three study plans yielded an AUC of up to 0.92, sensitivity of up to 0.88, PPV of up to 0.72, specificity of up to 0.98, and NPV of up to 0.99.
The results showed the capability of our ML approach to distinguish between classes of smokers and nonsmokers. This model can be used as a brief screening tool for automated prediction of individuals susceptible to smoking for more precise preventive intervention plans focusing on adolescents.
鉴于近年来青少年吸烟率不断上升,本研究提出一种机器学习(ML)方法,用于区分容易开始吸烟的青少年和那些不直接承认吸烟的青少年。
我们使用了两项重复测量的横断面研究,将来自7940人的数据作为不同的训练和测试数据集。利用随机最小绝对收缩和选择算子(LASSO),选择了最具影响力的因素。然后,我们研究了不同ML方法将学生自动分类为吸烟者/非吸烟者以及低风险/高风险类别的性能。
随机LASSO特征选择将15个因素列为优先因素,包括同伴影响、危险行为、对吸烟的态度和学校政策、家庭因素、抑郁和性别,这些是吸烟中最具影响力的因素。将不同的ML方法应用于三个研究计划,得到的曲线下面积(AUC)高达0.92,灵敏度高达0.88,阳性预测值(PPV)高达0.72,特异性高达0.98,阴性预测值(NPV)高达0.99。
结果表明我们的ML方法能够区分吸烟者和非吸烟者类别。该模型可作为一种简短的筛查工具,用于自动预测易吸烟个体,以便制定更精确的针对青少年的预防干预计划。