最新技术水平：标杆管理文化的时间顺序

State-of-the-Art: The Temporal Order of Benchmarking Culture.

作者信息

Campolo Alexander

机构信息

Department of Geography, Durham University, Lower Mountjoy, South Road, Durham, DH1 3LE UK.

出版信息

Digit Soc. 2025;4(2):35. doi: 10.1007/s44206-025-00190-x. Epub 2025 May 2.

DOI:10.1007/s44206-025-00190-x

PMID:40322469

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12048445/

Abstract

This commentary situates the epistemic values of machine learning's culture of benchmarking and evaluation within larger temporal structures. Beyond questions of validity, whether model comparisons are statistically valid or whether benchmarks adequately represent meaningful tasks or capabilities, it asks how benchmarks produce certain temporal values and expectations. It articulates two hypotheses in response: the first, termed normalizing research, seeks to characterize how benchmarking simultaneously serves a disciplining and motivating function in research, with the effect of minimizing conflict. The second, termed extrapolation, argues that the incremental, progressive rhythm of benchmarking is oriented less towards the future than towards a present state-of-the-art (SOTA). Together, these hypotheses inform a diagnosis of the presentist temporality of benchmarking and evaluation in machine learning.

摘要

本评论将机器学习基准测试与评估文化的认知价值置于更大的时间结构中。除了有效性问题，即模型比较在统计上是否有效，或者基准是否充分代表有意义的任务或能力之外，它还探讨了基准如何产生特定的时间价值和期望。作为回应，它阐述了两个假设：第一个称为规范化研究，旨在描述基准测试如何在研究中同时发挥规范和激励作用，从而减少冲突。第二个称为外推法，认为基准测试的渐进式节奏更多地是针对当前的技术水平（SOTA），而非面向未来。这些假设共同为机器学习中基准测试和评估的当下主义时间性诊断提供了依据。

相似文献

State-of-the-Art: The Temporal Order of Benchmarking Culture.最新技术水平：标杆管理文化的时间顺序

Digit Soc. 2025;4(2):35. doi: 10.1007/s44206-025-00190-x. Epub 2025 May 2.

Expanding Horizons: The Realities of CAD, the Promise of Artificial Intelligence, and Machine Learning's Role in Breast Imaging beyond Screening Mammography.拓展视野：计算机辅助检测的现状、人工智能的前景以及机器学习在乳腺成像（不局限于乳腺钼靶筛查）中的作用

Diagnostics (Basel). 2023 Jun 21;13(13):2133. doi: 10.3390/diagnostics13132133.

Recommendations for machine learning benchmarks in neuroimaging.神经影像学中机器学习基准的建议。

Neuroimage. 2022 Aug 15;257:119298. doi: 10.1016/j.neuroimage.2022.119298. Epub 2022 May 10.

Toward human-level concept learning: Pattern benchmarking for AI algorithms.迈向人类水平的概念学习：人工智能算法的模式基准测试。

Patterns (N Y). 2023 Jul 5;4(8):100788. doi: 10.1016/j.patter.2023.100788. eCollection 2023 Aug 11.

Benchmarking framework for machine learning classification from fNIRS data.基于功能近红外光谱（fNIRS）数据的机器学习分类基准框架。

Front Neuroergon. 2023 Mar 3;4:994969. doi: 10.3389/fnrgo.2023.994969. eCollection 2023.

Selection, presentism, and pluralist history.

Stud Hist Philos Sci. 2022 Apr;92:60-70. doi: 10.1016/j.shpsa.2022.01.003. Epub 2022 Feb 5.

Code-free machine learning for object detection in surgical video: a benchmarking, feasibility, and cost study.无代码机器学习在手术视频中的目标检测：基准测试、可行性和成本研究。

Neurosurg Focus. 2022 Apr;52(4):E11. doi: 10.3171/2022.1.FOCUS21652.

How good is your synthetic data? SynthRO, a dashboard to evaluate and benchmark synthetic tabular data.你的合成数据有多好？SynthRO，一个用于评估和基准测试合成表格数据的仪表板。

BMC Med Inform Decis Mak. 2025 Feb 18;25(1):89. doi: 10.1186/s12911-024-02731-9.

A reusable benchmark of brain-age prediction from M/EEG resting-state signals.基于静息态脑电信号的脑龄预测可重复基准。

Neuroimage. 2022 Nov 15;262:119521. doi: 10.1016/j.neuroimage.2022.119521. Epub 2022 Jul 26.

PMLB: a large benchmark suite for machine learning evaluation and comparison.PMLB：一个用于机器学习评估和比较的大型基准测试套件。

BioData Min. 2017 Dec 11;10:36. doi: 10.1186/s13040-017-0154-4. eCollection 2017.

本文引用的文献

Mapping global dynamics of benchmark creation and saturation in artificial intelligence.绘制人工智能基准创建和饱和的全球动态图。

Nat Commun. 2022 Nov 10;13(1):6793. doi: 10.1038/s41467-022-34591-0.

The Dostoevsky Machine in Georgetown: scientific translation in the Cold War.

Ann Sci. 2016 Apr;73(2):208-23. doi: 10.1080/00033790.2014.917437. Epub 2014 Jun 17.

An agenda for STS: Porter on trust and quantification in science, politics and society. [Review of: Porter TM. Trust in numbers: the pursuit of objectivity in science and public life. Princeton University Press, 1995].科学技术与社会（STS）议程：波特论科学、政治和社会中的信任与量化。[评：波特·TM。《对数字的信任：科学与公共生活中对客观性的追求》。普林斯顿大学出版社，1995年]

Soc Stud Sci. 1999 Aug;29(4):629-37. doi: 10.1177/030631299029004007.

Whither speech recognition?语音识别何去何从？

J Acoust Soc Am. 1970 Jun;47(6):1616-7. doi: 10.1121/1.1912099.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验