School of Public Health Sciences, University of Waterloo, Waterloo, Ontario, Canada.
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada.
Health Promot Chronic Dis Prev Can. 2023 Feb;43(2):73-86. doi: 10.24095/hpcdp.43.2.03.
In population health surveillance research, survey data are commonly analyzed using regression methods; however, these methods have limited ability to examine complex relationships. In contrast, decision tree models are ideally suited for segmenting populations and examining complex interactions among factors, and their use within health research is growing. This article provides a methodological overview of decision trees and their application to youth mental health survey data.
The performance of two popular decision tree techniques, the classification and regression tree (CART) and conditional inference tree (CTREE) techniques, is compared to traditional linear and logistic regression models through an application to youth mental health outcomes in the COMPASS study. Data were collected from 74 501 students across 136 schools in Canada. Anxiety, depression and psychosocial well-being outcomes were measured along with 23 sociodemographic and health behaviour predictors. Model performance was assessed using measures of prediction accuracy, parsimony and relative variable importance.
Decision tree and regression models consistently identified the same sets of most important predictors for each outcome, indicating a general level of agreement between methods. Tree models had lower prediction accuracy but were more parsimonious and placed greater relative importance on key differentiating factors.
Decision trees provide a means of identifying high-risk subgroups to whom prevention and intervention efforts can be targeted, making them a useful tool to address research questions that cannot be answered by traditional regression methods.
在人群健康监测研究中,调查数据通常使用回归方法进行分析;然而,这些方法在检验复杂关系方面的能力有限。相比之下,决策树模型非常适合对人群进行细分,并检验因素之间的复杂相互作用,并且它们在健康研究中的应用正在不断增加。本文提供了决策树的方法概述及其在青年心理健康调查数据中的应用。
通过对 COMPASS 研究中青少年心理健康结果的应用,将两种流行的决策树技术(分类回归树 (CART) 和条件推断树 (CTREE))的性能与传统的线性和逻辑回归模型进行了比较。数据来自加拿大 136 所学校的 74501 名学生。焦虑、抑郁和心理社会健康结果与 23 项社会人口统计学和健康行为预测因素一起进行了测量。使用预测准确性、简约性和相对变量重要性的度量来评估模型性能。
决策树和回归模型始终为每个结果识别出相同的最重要预测因素集,表明方法之间存在一般水平的一致性。树模型的预测准确性较低,但更简约,并对关键区分因素赋予更大的相对重要性。
决策树提供了一种识别高风险亚组的方法,可以针对这些亚组进行预防和干预工作,使其成为解决传统回归方法无法回答的研究问题的有用工具。