Suppr超能文献

结合人类启发式方法的主动学习:一种对标签偏差具有鲁棒性的算法

Active learning with human heuristics: an algorithm robust to labeling bias.

作者信息

Ravichandran Sriram, Sudarsanam Nandan, Ravindran Balaraman, Katsikopoulos Konstantinos V

机构信息

Department of Management Studies, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India.

Department of Data Science and AI, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India.

出版信息

Front Artif Intell. 2024 Nov 19;7:1491932. doi: 10.3389/frai.2024.1491932. eCollection 2024.

Abstract

Active learning enables prediction models to achieve better performance faster by adaptively querying an oracle for the labels of data points. Sometimes the oracle is a human, for example when a medical diagnosis is provided by a doctor. According to the behavioral sciences, people, because they employ heuristics, might sometimes exhibit biases in labeling. How does modeling the oracle as a human heuristic affect the performance of active learning algorithms? If there is a drop in performance, can one design active learning algorithms robust to labeling bias? The present article provides answers. We investigate two established human heuristics (fast-and-frugal tree, tallying model) combined with four active learning algorithms (entropy sampling, multi-view learning, conventional information density, and, our proposal, inverse information density) and three standard classifiers (logistic regression, random forests, support vector machines), and apply their combinations to 15 datasets where people routinely provide labels, such as health and other domains like marketing and transportation. There are two main results. First, we show that if a heuristic provides labels, the performance of active learning algorithms significantly drops, sometimes below random. Hence, it is key to design active learning algorithms that are robust to labeling bias. Our second contribution is to provide such a robust algorithm. The proposed inverse information density algorithm, which is inspired by human psychology, achieves an overall improvement of 87% over the best of the other algorithms. In conclusion, designing and benchmarking active learning algorithms can benefit from incorporating the modeling of human heuristics.

摘要

主动学习通过自适应地向神谕查询数据点的标签,使预测模型能够更快地实现更好的性能。有时神谕是人类,例如当医生提供医学诊断时。根据行为科学,由于人们使用启发式方法,他们在标记时有时可能会表现出偏差。将神谕建模为人类启发式方法如何影响主动学习算法的性能?如果性能下降,能否设计出对标记偏差具有鲁棒性的主动学习算法?本文提供了答案。我们研究了两种既定的人类启发式方法(快速节俭树、计数模型)与四种主动学习算法(熵采样、多视图学习、传统信息密度以及我们提出的逆信息密度)和三种标准分类器(逻辑回归、随机森林、支持向量机)的组合,并将它们的组合应用于15个数据集,在这些数据集中人们经常提供标签,如健康以及营销和交通等其他领域。有两个主要结果。首先,我们表明,如果一种启发式方法提供标签,主动学习算法的性能会显著下降,有时甚至低于随机水平。因此,设计对标记偏差具有鲁棒性的主动学习算法是关键。我们的第二个贡献是提供这样一种鲁棒算法。所提出的逆信息密度算法受人类心理学启发,与其他算法中的最佳算法相比,整体性能提高了87%。总之,设计和基准测试主动学习算法可以从纳入人类启发式方法的建模中受益。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4961/11611880/422986244862/frai-07-1491932-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验