Ghose Abhishek, Ravindran Balaraman
Department of Computer Science and Engineering, IIT Madras, Chennai, India.
Department of Computer Science and Engineering, Robert Bosch Centre for Data Science and AI, IIT Madras, Chennai, India.
Front Artif Intell. 2020 Feb 25;3:3. doi: 10.3389/frai.2020.00003. eCollection 2020.
Models often need to be constrained to a certain size for them to be considered interpretable. For example, a decision tree of depth 5 is much easier to understand than one of depth 50. Limiting model size, however, often reduces accuracy. We suggest a practical technique that minimizes this trade-off between interpretability and classification accuracy. This enables an arbitrary learning algorithm to produce highly accurate small-sized models. Our technique identifies the training data distribution to learn from that leads to the highest accuracy for a model of a given size. We represent the training distribution as a combination of sampling schemes. Each scheme is defined by a parameterized probability mass function applied to the segmentation produced by a decision tree. An Infinite Mixture Model with Beta components is used to represent a combination of such schemes. The mixture model parameters are learned using Bayesian Optimization. Under simplistic assumptions, we would need to optimize for () variables for a distribution over a -dimensional input space, which is cumbersome for most real-world data. However, we show that our technique significantly reduces this number to a at the cost of relatively cheap preprocessing. The proposed technique is flexible: it is , i.e., it may be applied to the learning algorithm for any model family, and it admits a general notion of model size. We demonstrate its effectiveness using multiple real-world datasets to construct decision trees, linear probability models and gradient boosted models with different sizes. We observe significant improvements in the F1-score in most instances, exceeding an improvement of 100% in some cases.
为了使模型具有可解释性,通常需要将其限制在一定规模内。例如,深度为5的决策树比深度为50的决策树更容易理解。然而,限制模型规模往往会降低准确率。我们提出了一种实用技术,可将可解释性与分类准确率之间的这种权衡降至最低。这使得任意学习算法都能生成高度准确的小规模模型。我们的技术能识别用于学习的训练数据分布,从而使给定规模的模型达到最高准确率。我们将训练分布表示为采样方案的组合。每个方案由应用于决策树生成的分割的参数化概率质量函数定义。使用具有贝塔分量的无限混合模型来表示此类方案的组合。混合模型参数通过贝叶斯优化进行学习。在简单的假设下,对于一个(d)维输入空间上的分布,我们需要针对((d))个变量进行优化,这对于大多数实际数据来说很繁琐。然而,我们表明我们的技术以相对廉价的预处理为代价,将这个数量显著减少到一个(常数)。所提出的技术具有灵活性:它是(通用的),即它可以应用于任何模型族的学习算法,并且它允许对模型规模有一个通用的概念。我们使用多个真实世界数据集来构建不同规模的决策树、线性概率模型和梯度提升模型,以此证明其有效性。在大多数情况下,我们观察到F1分数有显著提高,在某些情况下超过了100%的提升。