Leben Derek
Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA, United States.
Front Psychol. 2023 Feb 14;14:1069426. doi: 10.3389/fpsyg.2023.1069426. eCollection 2023.
This paper will propose that explanations are valuable to those impacted by a model's decisions (model patients) to the extent that they provide evidence that a past adverse decision was unfair. Under this proposal, we should favor models and explainability methods which generate counterfactuals of two types. The first type of counterfactual is evidence of fairness: a set of states under the control of the patient which (if changed) would have led to a beneficial decision. The second type of counterfactual is evidence of fairness: a set of irrelevant group or behavioral attributes which (if changed) would have led to a beneficial decision. Each of these counterfactual statements is related to fairness, under the Liberal Egalitarian idea that treating one person differently than another is justified only on the basis of features which were plausibly under each person's control. Other aspects of an explanation, such as feature importance and actionable recourse, are essential under this view, and need not be a goal of explainable AI.
本文将提出,解释对于受模型决策影响的人(模型患者)具有价值,其程度取决于这些解释能提供证据表明过去的不利决策是不公平的。根据这一提议,我们应该青睐能生成两种反事实情况的模型和可解释性方法。第一种反事实情况是公平性的证据:一组在患者控制之下的状态(如果改变这些状态)会导致有利的决策。第二种反事实情况也是公平性的证据:一组不相关的群体或行为属性(如果改变这些属性)会导致有利的决策。根据自由平等主义的观点,只有基于每个人可能控制的特征来区别对待一个人和另一个人才是合理的,上述每一个反事实陈述都与公平性相关。在这种观点下,解释的其他方面,如特征重要性和可采取的行动,是至关重要的,而且不一定是可解释人工智能的目标。