Suppr超能文献

绘制人工智能基准创建和饱和的全球动态图。

Mapping global dynamics of benchmark creation and saturation in artificial intelligence.

机构信息

Institute of Artificial Intelligence, Medical University of Vienna. Währingerstraße 25a, 1090, Vienna, Austria.

ITTM S.A.-Information Technology for Translational Medicine, Esch-sur-Alzette, 4354, Luxembourg.

出版信息

Nat Commun. 2022 Nov 10;13(1):6793. doi: 10.1038/s41467-022-34591-0.

Abstract

Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To facilitate monitoring of the health of the AI benchmarking ecosystem, we introduce methodologies for creating condensed maps of the global dynamics of benchmark creation and saturation. We curate data for 3765 benchmarks covering the entire domains of computer vision and natural language processing, and show that a large fraction of benchmarks quickly trends towards near-saturation, that many benchmarks fail to find widespread utilization, and that benchmark performance gains for different AI tasks are prone to unforeseen bursts. We analyze attributes associated with benchmark popularity, and conclude that future benchmarks should emphasize versatility, breadth and real-world utility.

摘要

基准对于衡量和引导人工智能(AI)的进展至关重要。然而,最近的研究报告了一些关于 AI 基准测试的问题,如基准过拟合、基准饱和以及基准数据集创建的日益集中化。为了便于监测 AI 基准测试生态系统的健康状况,我们引入了创建基准创建和饱和全球动态的浓缩图的方法。我们为涵盖计算机视觉和自然语言处理各个领域的 3765 个基准整理了数据,并表明很大一部分基准很快就接近饱和,许多基准未能得到广泛应用,而且不同 AI 任务的基准性能增益容易出现意外的爆发。我们分析了与基准流行度相关的属性,并得出结论,未来的基准应该强调多功能性、广泛性和实际应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d821/9649641/970230f96540/41467_2022_34591_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验