Zhang Rui, Xin Rui, Seltzer Margo, Rudin Cynthia
Duke University.
University of British Columbia.
Proc AAAI Conf Artif Intell. 2023 Jun;37(9):11270-11279. doi: 10.1609/aaai.v37i9.26334.
Regression trees are one of the oldest forms of AI models, and their predictions can be made without a calculator, which makes them broadly useful, particularly for high-stakes applications. Within the large literature on regression trees, there has been little effort towards full provable optimization, mainly due to the computational hardness of the problem. This work proposes a dynamic-programming-with-bounds approach to the construction of provably-optimal sparse regression trees. We leverage a novel lower bound based on an optimal solution to the k-Means clustering algorithm on one dimensional data. We are often able to find optimal sparse trees in seconds, even for challenging datasets that involve large numbers of samples and highly-correlated features.
回归树是最古老的人工智能模型形式之一,其预测无需计算器即可完成,这使其具有广泛的用途,特别是在高风险应用中。在关于回归树的大量文献中,几乎没有人致力于完全可证明的优化,主要是因为该问题的计算难度较大。这项工作提出了一种带边界的动态规划方法来构建可证明最优的稀疏回归树。我们利用了一种基于一维数据的k均值聚类算法的最优解的新颖下界。即使对于涉及大量样本和高度相关特征的具有挑战性的数据集,我们通常也能在几秒钟内找到最优的稀疏树。