King Matthew W, Resick Patricia A
Women's Health Sciences Division, National Center for PTSD, VA Boston Healthcare System.
J Consult Clin Psychol. 2014 Oct;82(5):895-905. doi: 10.1037/a0035886. Epub 2014 Mar 3.
Data mining of treatment study results can reveal unforeseen but critical insights, such as who receives the most benefit from treatment and under what circumstances. The usefulness and legitimacy of exploratory data analysis have received relatively little recognition, however, and analytic methods well suited to the task are not widely known in psychology. With roots in computer science and statistics, statistical learning approaches offer a credible option: These methods take a more inductive approach to building a model than is done in traditional regression, allowing the data greater role in suggesting the correct relationships between variables rather than imposing them a priori. Classification and regression trees are presented as a powerful, flexible exemplar of statistical learning methods. Trees allow researchers to efficiently identify useful predictors of an outcome and discover interactions between predictors without the need to anticipate and specify these in advance, making them ideal for revealing patterns that inform hypotheses about treatment effects. Trees can also provide a predictive model for forecasting outcomes as an aid to clinical decision making. This primer describes how tree models are constructed, how the results are interpreted and evaluated, and how trees overcome some of the complexities of traditional regression. Examples are drawn from randomized clinical trial data and highlight some interpretations of particular interest to treatment researchers. The limitations of tree models are discussed, and suggestions for further reading and choices in software are offered.
对治疗研究结果进行数据挖掘能够揭示一些未曾预料到但至关重要的见解,比如谁从治疗中获益最多以及在何种情况下获益。然而,探索性数据分析的实用性和合理性相对较少得到认可,而且适合这项任务的分析方法在心理学领域并不广为人知。统计学习方法源于计算机科学和统计学,提供了一个可靠的选择:与传统回归相比,这些方法采用更具归纳性的方式来构建模型,让数据在揭示变量之间的正确关系方面发挥更大作用,而不是预先强加这些关系。分类和回归树被作为统计学习方法的一种强大且灵活的范例呈现出来。树状图使研究人员能够高效地识别结果的有用预测因素,并发现预测因素之间的相互作用,而无需事先预测和指定这些因素,这使得它们非常适合揭示能够为关于治疗效果的假设提供依据的模式。树状图还可以提供一个预测模型来预测结果,以辅助临床决策。本入门指南描述了树状模型是如何构建的、结果是如何解释和评估的,以及树状图如何克服传统回归的一些复杂性。示例取自随机临床试验数据,并突出了治疗研究人员特别感兴趣的一些解释。讨论了树状模型的局限性,并提供了进一步阅读的建议和软件选择。