Dell Nathaniel A, Vaughn Michael G, Prasad Srivastava Sweta, Alsolami Abdulaziz, Salas-Wright Christopher P
School of Social Work, Saint Louis University, St. Louis, MO, 63103, United States.
School of Social Work, Saint Louis University, St. Louis, MO, 63103, United States; Department of Special Education, King Abdulaziz University, Jeddah, Saudi Arabia.
J Psychiatr Res. 2022 Jul;151:590-597. doi: 10.1016/j.jpsychires.2022.05.021. Epub 2022 May 23.
Although several recent studies have examined psychosocial and demographic correlates of cannabis use disorder (CUD) in adults, few, if any, recent studies have evaluated the performance of machine learning methods relative to standard logistic regression for identifying correlates of CUD. The present study used pooled data from the 2015-2018 National Survey on Drug Use and Health to evaluate psychosocial and demographic correlates of CUD in adults. In addition, we compared the performance of logistic regression, classification trees, and random forest methods in classifying CUD. When comparing the performance of each method on the test data set, classification trees (AUC = 0.84, 95%CI: 0.82, 0.85) and random forest (AUC = 0.83, 95%CI: 0.82, 8.05) performed similarly and superior to logistic regression (AUC = 0.77, 95%CI: 0.74, 0.79). Results of the random forests reveal that marital status, risk propensity, age, and cocaine dependence variables contributed most to node purity, whereas model accuracy would decrease significantly if county type, income, race, and education variables were excluded from the model. One possible approach to improving the efficiency, interpretability, and clinical insights of CUD correlates is the employment of machine learning techniques.
尽管最近有几项研究探讨了成年人中大麻使用障碍(CUD)的心理社会和人口统计学相关因素,但最近几乎没有(如果有的话)研究评估机器学习方法相对于标准逻辑回归在识别CUD相关因素方面的表现。本研究使用了2015 - 2018年全国药物使用和健康调查的汇总数据来评估成年人中CUD的心理社会和人口统计学相关因素。此外,我们比较了逻辑回归、分类树和随机森林方法在对CUD进行分类时的表现。在比较每种方法在测试数据集上的表现时,分类树(AUC = 0.84,95%CI:0.82,0.85)和随机森林(AUC = 0.83,95%CI:0.82,8.05)表现相似且优于逻辑回归(AUC = 0.77,95%CI:0.74,0.79)。随机森林的结果表明,婚姻状况、风险倾向、年龄和可卡因依赖变量对节点纯度的贡献最大,而如果从模型中排除县类型、收入、种族和教育变量,模型准确性将显著下降。提高CUD相关因素的效率、可解释性和临床见解的一种可能方法是采用机器学习技术。