Tang Liyan, Peng Yifan, Wang Yanshan, Ding Ying, Durrett Greg, Rousseau Justin F
The University of Texas at Austin.
Weill Cornell Medicine.
Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:12532-12555. doi: 10.18653/v1/2023.findings-acl.794.
A human decision-maker benefits the most from an AI assistant that corrects for their biases. For problems such as generating interpretation of a radiology report given findings, a system predicting only highly likely outcomes may be less useful, where such outcomes are already obvious to the user. To alleviate biases in human decision-making, it is worth considering a broad differential diagnosis, going beyond the most likely options. We introduce a new task, "less likely brainstorming," that asks a model to generate outputs that humans think are relevant but less likely to happen. We explore the task in two settings: a brain MRI interpretation generation setting and an everyday commonsense reasoning setting. We found that a baseline approach of training with less likely hypotheses as targets generates outputs that humans evaluate as either likely or irrelevant nearly half of the time; standard MLE training is not effective. To tackle this problem, we propose a controlled text generation method that uses a novel contrastive learning strategy to encourage models to differentiate between generating likely and less likely outputs according to humans. We compare our method with several state-of-the-art controlled text generation models via automatic and human evaluations and show that our models' capability of generating less likely outputs is improved.
人类决策者从能够纠正其偏差的人工智能助手那里受益最大。对于诸如根据检查结果生成放射学报告解读等问题,一个只预测极有可能结果的系统可能用处不大,因为这些结果对用户来说已经很明显了。为了减轻人类决策中的偏差,值得考虑进行广泛的鉴别诊断,而不仅仅局限于最有可能的选项。我们引入了一项新任务,即“不太可能的头脑风暴”,该任务要求模型生成人类认为相关但发生可能性较小的输出。我们在两种场景下探索这项任务:脑磁共振成像解读生成场景和日常常识推理场景。我们发现,以不太可能的假设为目标进行训练的基线方法所生成的输出,近一半时间被人类评估为要么可能要么不相关;标准的最大似然估计训练并不有效。为了解决这个问题,我们提出了一种可控文本生成方法,该方法使用一种新颖的对比学习策略,鼓励模型根据人类的判断来区分生成可能和不太可能的输出。我们通过自动评估和人工评估将我们的方法与几种先进的可控文本生成模型进行比较,结果表明我们的模型生成不太可能输出的能力得到了提高。