Department of Operations and Information Systems, Manning School of Business, University of Massachusetts Lowell, Lowell, Massachusetts, USA.
Susan and Alan Solomont School of Nursing, Zuckerberg College of Health Sciences, University of Massachusetts Lowell, Lowell, Massachusetts, USA.
J Am Med Inform Assoc. 2021 Jun 12;28(6):1216-1224. doi: 10.1093/jamia/ocaa350.
Substance use disorder is a critical public health issue. Discovering the synergies among factors impacting treatment program success can help governments and treatment facilities develop effective policies. In this work, we propose a novel data analytics approach using machine learning models to discover interaction effects that might be neglected by traditional hypothesis-generating approaches.
A patient-episode-level substance use treatment discharge dataset and a Federal Bureau of Investigation crime dataset were joined using core-based statistical area codes. Random forests, artificial neural networks, and extreme gradient boosting were applied with a nested cross-validation methodology. Interaction effects were identified based on the machine learning model with the best performance. These interaction effects were analyzed and tested using traditional logistic regression models on unseen data.
In predicting patient completion of a treatment program, extreme gradient boosting performed the best with an area under the curve of 89.31%. Based on our procedure, 73 interaction effects were identified. Among these, 14 were tested using traditional logistic regression models where 12 were statistically significant (P<.05).
We identified new interaction effects among the length of stay, frequency of substance use, changes in self-help group attendance frequency, and other factors. This work provides insights into the interactions between factors impacting treatment completion. Further traditional statistical analysis can be employed by practitioners and policy makers to test the effects discovered by our novel machine learning approach.
药物滥用障碍是一个严重的公共卫生问题。发现影响治疗计划成功的因素之间的协同作用可以帮助政府和治疗机构制定有效的政策。在这项工作中,我们提出了一种使用机器学习模型的新数据分析方法,以发现传统的假设生成方法可能忽略的交互作用。
使用基于核心的统计区域代码将患者-发作水平的药物使用治疗出院数据集和联邦调查局犯罪数据集合并。随机森林、人工神经网络和极端梯度增强在嵌套交叉验证方法中应用。基于性能最佳的机器学习模型确定交互作用。在未见数据上使用传统逻辑回归模型对这些交互作用进行分析和测试。
在预测患者完成治疗计划方面,极端梯度增强的表现最佳,曲线下面积为 89.31%。根据我们的程序,确定了 73 个交互作用。其中,12 个具有统计学意义(P<.05)的交互作用使用传统逻辑回归模型进行了测试。
我们发现了新的交互作用,包括住院时间、药物使用频率、自助小组参加频率的变化以及其他因素之间的交互作用。这项工作提供了对影响治疗完成的因素之间相互作用的深入了解。从业者和政策制定者可以进一步采用传统的统计分析来测试我们新的机器学习方法发现的效果。