Gross Samuel M, Tibshirani Robert
Nuna, 650 Townsend St, San Francisco, CA.
Department of Statistics, Stanford University, Stanford, CA.
Comput Stat Data Anal. 2016 Sep;101:226-235. doi: 10.1016/j.csda.2016.02.015. Epub 2016 Mar 12.
A model is presented for the supervised learning problem where the observations come from a fixed number of pre-specified groups, and the regression coefficients may vary sparsely between groups. The model spans the continuum between individual models for each group and one model for all groups. The resulting algorithm is designed with a high dimensional framework in mind. The approach is applied to a sentiment analysis dataset to show its efficacy and interpretability. One particularly useful application is for finding sub-populations in a randomized trial for which an intervention (treatment) is beneficial, often called the problem. Some new concepts are introduced that are useful for uplift analysis. The value is demonstrated in an application to a real world credit card promotion dataset. In this example, although sending the promotion has a very small average effect, by targeting a particular subgroup with the promotion one can obtain a 15% increase in the proportion of people who purchase the new credit card.
本文提出了一种用于监督学习问题的模型,其中观测值来自固定数量的预先指定的组,并且回归系数在组间可能稀疏变化。该模型涵盖了每个组的个体模型与所有组的一个模型之间的连续统。所得到的算法是在高维框架下设计的。该方法应用于一个情感分析数据集,以展示其有效性和可解释性。一个特别有用的应用是在随机试验中寻找干预(治疗)有益的亚群体,这通常被称为 问题。引入了一些对提升分析有用的新概念。在应用于一个真实世界的信用卡促销数据集时展示了其价值。在这个例子中,虽然发送促销的平均效果非常小,但通过针对特定子群体进行促销,可以使购买新信用卡的人群比例提高15%。