Banerjee Mousumi, Reynolds Evan, Andersson Hedvig B, Nallamothu Brahmajee K
Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor (M.B., E.R.).
Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor (M.B., H.B.A., B.K.N.).
Circ Cardiovasc Qual Outcomes. 2019 May;12(5):e004879. doi: 10.1161/CIRCOUTCOMES.118.004879.
Tree-based methods have become one of the most flexible, intuitive, and powerful data analytic tools for exploring complex data structures. Tree-based methods provide a natural framework for creating patient subgroups for risk classification. In this article, we review methodological and practical aspects of tree-based methods, with a focus on diagnostic classification (binary outcome) and prognostication (censored survival outcome). Creating an ensemble of trees improves prediction accuracy and addresses instability in a single tree. Ensemble methods are described that rely on resampling from the original data. Finally, we present methods to identify a representative tree from the ensemble that can be used for clinical decision-making. The methods are illustrated using data on ischemic heart disease classification, and data from the SPRINT trial (Systolic Blood Pressure Intervention Trial) on adverse events in patients with high blood pressure.
基于树的方法已成为探索复杂数据结构最灵活、直观且强大的数据分析工具之一。基于树的方法为创建用于风险分类的患者亚组提供了一个自然框架。在本文中,我们回顾基于树的方法的方法学和实践方面,重点关注诊断分类(二元结局)和预后(删失生存结局)。构建树的集成可提高预测准确性并解决单棵树的不稳定性问题。描述了依赖于从原始数据进行重采样的集成方法。最后,我们提出从集成中识别可用于临床决策的代表性树的方法。使用缺血性心脏病分类数据以及收缩压干预试验(SPRINT)中高血压患者不良事件的数据对这些方法进行了说明。