Department of Health Administration and Policy, George Mason University, Fairfax, VA, United States.
Department of Health Administration and Policy, George Mason University, Fairfax, VA, United States.
Drug Alcohol Depend. 2021 Aug 1;225:108789. doi: 10.1016/j.drugalcdep.2021.108789. Epub 2021 May 28.
Identifying the characteristics of adults with recent marijuana use is limited by standard statistical methods and requires a unique approach. The objective of this study is to evaluate the efficiency of machine learning models in predicting daily marijuana use and identify factors associated with daily use among adults. The study analyzed pooled data from the 2016-2019 Behavioral Risk Factor Surveillance System (BRFSS) Survey in 2020. Prediction models were developed using four machine learning algorithms, including Logistic Regression, Decision Tree, and Random Forest with Gini function, and Naïve Bayes. Respondents were randomly divided into training and testing samples. The performance of all the models was compared using accuracy, AUC, precision, and recall. The study included 253,569 respondents, of whom 10,182 (5.9 %) reported daily marijuana use in the last 30 days. Of daily marijuana use, 53.4 % were young adults (age 18-34 years), 34.3 % female, 56.1 % non-Hispanic White, 15.2 % were college graduates, and 67.3 % were employed. Random Forest was the best performing model with AUC 0.97, followed by a Decision tree (AUC 0.95). The most important factors for daily marijuana use were the current use of e-cigarette and combustible cigarette use, male gender, unmarried, poor mental health, depression, cognitive decline, abnormal sleep pattern, and high-risk behavior. Data mining methods were useful in the discovery of behavior health-risk knowledge and to visualize the significance of predicting modeling from a multidimensional behavioral health survey.
识别近期使用大麻的成年人的特征受到标准统计方法的限制,需要采用独特的方法。本研究旨在评估机器学习模型在预测成年人每日大麻使用量方面的效率,并确定与每日使用相关的因素。该研究分析了 2020 年 pooled 数据来自 2016-2019 年行为风险因素监测系统(BRFSS)调查。使用四种机器学习算法(包括逻辑回归、决策树和随机森林与基尼函数,以及朴素贝叶斯)开发预测模型。受访者被随机分为训练和测试样本。使用准确性、AUC、精度和召回率比较所有模型的性能。该研究包括 253569 名受访者,其中 10182 名(5.9%)报告在过去 30 天内每日使用大麻。在每日使用大麻的人群中,53.4%为年轻人(18-34 岁),34.3%为女性,56.1%为非西班牙裔白人,15.2%为大学毕业生,67.3%为在职人员。随机森林是表现最好的模型,AUC 为 0.97,其次是决策树(AUC 为 0.95)。每日使用大麻的最重要因素是当前使用电子烟和可燃香烟、男性、未婚、心理健康状况差、抑郁、认知能力下降、睡眠模式异常和高危行为。数据挖掘方法有助于发现行为健康风险知识,并从多维行为健康调查中可视化预测建模的重要性。