Reynolds Evan, Callaghan Brian, Banerjee Mousumi
Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109.
Department of Neurology, University of Michigan, Ann Arbor, MI 48109.
J Appl Stat. 2019;46(16):2987-3007. doi: 10.1080/02664763.2019.1625876. Epub 2019 Jun 7.
Classification and regression trees (CART) and support vector machines (SVM) have become very popular statistical learning tools for analyzing complex data that often arise in biomedical research. While both CART and SVM serve as powerful classifiers in many clinical settings, there are some common scenarios in which each fails to meet the performance and interpretability needed for use as a clinical decision-making tool. In this paper, we propose a new classification method, SVM-CART, that combines features of SVM and CART to produce a more flexible classifier that has the potential to outperform either method in terms of interpretability and prediction accuracy. Further-more, to enhance prediction accuracy we provide extensions of a single SVM-CART to an ensemble, and methods to extract a representative classifier from the SVM-CART ensemble. The goal is to produce a decision-making tool that can be used in the clinical setting, while still harnessing the stability and predictive improvements gained through developing the SVM-CART ensemble. An extensive simulation study is conducted to asses the performance of the methods in various settings. Finally, we illustrate our methods using a clinical neuropathy dataset.
分类与回归树(CART)和支持向量机(SVM)已成为用于分析生物医学研究中经常出现的复杂数据的非常流行的统计学习工具。虽然CART和SVM在许多临床环境中都是强大的分类器,但在某些常见情况下,它们各自都无法满足用作临床决策工具所需的性能和可解释性。在本文中,我们提出了一种新的分类方法SVM-CART,它结合了SVM和CART的特征,以产生一个更灵活的分类器,该分类器在可解释性和预测准确性方面有可能优于任何一种方法。此外,为了提高预测准确性,我们将单个SVM-CART扩展为一个集成模型,并提供从SVM-CART集成模型中提取代表性分类器的方法。目标是生成一种可用于临床环境的决策工具,同时仍能利用通过开发SVM-CART集成模型获得的稳定性和预测改进。我们进行了广泛的模拟研究,以评估这些方法在各种环境中的性能。最后,我们使用一个临床神经病变数据集来说明我们的方法。