Wu Yinjun, Keoliya Mayank, Chen Kan, Velingker Neelay, Li Ziyang, Getzen Emily J, Long Qi, Naik Mayur, Parikh Ravi B, Wong Eric
School of Computer Science, Peking University, Beijing, China.
Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, United States.
Proc Mach Learn Res. 2024 Jul;235:53597-53618.
Designing faithful yet accurate AI models is challenging, particularly in the field of individual treatment effect estimation (ITE). ITE prediction models deployed in critical settings such as healthcare should ideally be (i) accurate, and (ii) provide faithful explanations. However, current solutions are inadequate: state-of-the-art black-box models do not supply explanations, post-hoc explainers for black-box models lack faithfulness guarantees, and self-interpretable models greatly compromise accuracy. To address these issues, we propose DISCRET, a self-interpretable ITE framework that synthesizes faithful, rule-based explanations for each sample. A key insight behind DISCRET is that explanations can serve dually as to identify similar subgroups of samples. We provide a novel RL algorithm to efficiently synthesize these explanations from a large search space. We evaluate DISCRET on diverse tasks involving tabular, image, and text data. DISCRET outperforms the best self-interpretable models and has accuracy comparable to the best black-box models while providing faithful explanations. DISCRET is available at https://github.com/wuyinjun-1993/DISCRET-ICML2024.
设计既忠实又准确的人工智能模型具有挑战性,尤其是在个体治疗效果估计(ITE)领域。部署在医疗保健等关键环境中的ITE预测模型理想情况下应具备:(i)准确性,以及(ii)提供可靠的解释。然而,目前的解决方案并不完善:最先进的黑箱模型不提供解释,黑箱模型的事后解释器缺乏可靠性保证,而可自我解释的模型则大大牺牲了准确性。为了解决这些问题,我们提出了DISCRET,这是一个可自我解释的ITE框架,它为每个样本综合生成可靠的、基于规则的解释。DISCRET背后的一个关键见解是,解释可以双重作用于识别相似的样本子组。我们提供了一种新颖的强化学习算法,以从大型搜索空间中高效地综合这些解释。我们在涉及表格、图像和文本数据的各种任务上对DISCRET进行了评估。DISCRET优于最佳的可自我解释模型,并且在提供可靠解释的同时,其准确性与最佳黑箱模型相当。可在https://github.com/wuyinjun-1993/DISCRET-ICML2024获取DISCRET。