Suppr超能文献

A benchmarking framework and dataset for learning to defer in human-AI decision-making.

作者信息

Alves Jean V, Leitão Diogo, Jesus Sérgio, Sampaio Marco O P, Liébana Javier, Saleiro Pedro, Figueiredo Mário A T, Bizarro Pedro

机构信息

Feedzai, Coimbra, Portugal.

Instituto Superior Técnico, ULisboa, Lisboa, Portugal.

出版信息

Sci Data. 2025 Apr 23;12(1):506. doi: 10.1038/s41597-025-04664-y.

Abstract

Learning to Defer (L2D) algorithms improve human-AI collaboration by deferring decisions to human experts when they are likely to be more accurate than the AI model. These can be crucial in high-stakes tasks like fraud detection, where false negatives can cost victims their life savings. The primary challenge in training and evaluating these systems is the high cost of acquiring expert predictions, often leading to the use of simplistic simulated expert behavior in benchmarks. We introduce OpenL2D, a framework generating synthetic experts with adjustable decision-making processes and work capacity constraints for more realistic L2D testing. Applied to a public fraud detection dataset, OpenL2D creates the financial fraud alert review dataset (FiFAR), which contains predictions from 50 fraud analysts for 30 K instances. We show that FiFAR's synthetic experts are similar to real experts in metrics such as consistency and inter-expert agreement. Our L2D benchmark reveals that performance rankings of L2D algorithms vary significantly based on the available experts, highlighting the need to consider diverse expert behavior in L2D benchmarking.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef6c/12019285/ee509127440f/41597_2025_4664_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验