• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PMLB:一个用于机器学习评估和比较的大型基准测试套件。

PMLB: a large benchmark suite for machine learning evaluation and comparison.

作者信息

Olson Randal S, La Cava William, Orzechowski Patryk, Urbanowicz Ryan J, Moore Jason H

机构信息

Institute for Biomedical Informatics, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, 19104 PA USA.

Department of Automatics and Biomedical Engineering, AGH University of Science and Technology, Kraków, Poland.

出版信息

BioData Min. 2017 Dec 11;10:36. doi: 10.1186/s13040-017-0154-4. eCollection 2017.

DOI:10.1186/s13040-017-0154-4
PMID:29238404
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5725843/
Abstract

BACKGROUND

The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists.

RESULTS

The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered.

CONCLUSIONS

This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.

摘要

背景

基于特定研究的目标问题和目标,在数据挖掘中选择、开发或比较机器学习方法可能是一项艰巨的任务。众多来自不同来源的公开可用的真实世界和模拟基准数据集已经出现,但其组织和作为标准的采用一直不一致。因此,选择和策划特定的基准对机器学习从业者和数据科学家来说仍然是不必要的负担。

结果

本研究引入了一个可访问、经过策划且不断发展的公共基准资源,以促进对不同机器学习方法优缺点的识别。我们比较了该资源中当前基准数据集的元特征,以描述可用数据的多样性。最后,我们将一些已建立的机器学习方法应用于整个基准套件,并分析数据集和算法在性能方面是如何聚类的。从这项研究中,我们发现现有基准缺乏适当基准化机器学习算法的多样性,并且在基准问题方面仍有几个差距需要考虑。

结论

这项工作是朝着理解流行基准套件的局限性以及开发一种资源迈出的又一重要一步,该资源将在未来把现有的基准标准与更多样化和高效的标准联系起来。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/175b/5725843/6f0988ffaf59/13040_2017_154_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/175b/5725843/c24d6acf8027/13040_2017_154_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/175b/5725843/721d22942789/13040_2017_154_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/175b/5725843/c45aff90e46d/13040_2017_154_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/175b/5725843/9fe440e64a3a/13040_2017_154_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/175b/5725843/6f0988ffaf59/13040_2017_154_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/175b/5725843/c24d6acf8027/13040_2017_154_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/175b/5725843/721d22942789/13040_2017_154_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/175b/5725843/c45aff90e46d/13040_2017_154_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/175b/5725843/9fe440e64a3a/13040_2017_154_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/175b/5725843/6f0988ffaf59/13040_2017_154_Fig5_HTML.jpg

相似文献

1
PMLB: a large benchmark suite for machine learning evaluation and comparison.PMLB:一个用于机器学习评估和比较的大型基准测试套件。
BioData Min. 2017 Dec 11;10:36. doi: 10.1186/s13040-017-0154-4. eCollection 2017.
2
PMLB v1.0: an open-source dataset collection for benchmarking machine learning methods.PMLB v1.0:用于基准测试机器学习方法的开源数据集集合。
Bioinformatics. 2022 Jan 12;38(3):878-880. doi: 10.1093/bioinformatics/btab727.
3
Genomic benchmarks: a collection of datasets for genomic sequence classification.基因组基准测试:一组用于基因组序列分类的数据集。
BMC Genom Data. 2023 May 1;24(1):25. doi: 10.1186/s12863-023-01123-8.
4
Generative and reproducible benchmarks for comprehensive evaluation of machine learning classifiers.用于全面评估机器学习分类器的生成式和可重复基准。
Sci Adv. 2022 Nov 25;8(47):eabl4747. doi: 10.1126/sciadv.abl4747. Epub 2022 Nov 23.
5
Benchmarking machine learning models on multi-centre eICU critical care dataset.基于多中心 eICU 重症监护数据集的机器学习模型基准测试。
PLoS One. 2020 Jul 2;15(7):e0235424. doi: 10.1371/journal.pone.0235424. eCollection 2020.
6
ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data.ShinyLearner:一个用于表格数据机器学习分类的容器化基准测试工具。
Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa026.
7
A reusable benchmark of brain-age prediction from M/EEG resting-state signals.基于静息态脑电信号的脑龄预测可重复基准。
Neuroimage. 2022 Nov 15;262:119521. doi: 10.1016/j.neuroimage.2022.119521. Epub 2022 Jul 26.
8
Recommendations for machine learning benchmarks in neuroimaging.神经影像学中机器学习基准的建议。
Neuroimage. 2022 Aug 15;257:119298. doi: 10.1016/j.neuroimage.2022.119298. Epub 2022 May 10.
9
SzCORE: Seizure Community Open-Source Research Evaluation framework for the validation of electroencephalography-based automated seizure detection algorithms.SzCORE:用于验证基于脑电图的自动癫痫发作检测算法的癫痫发作社区开源研究评估框架。
Epilepsia. 2024 Sep 18. doi: 10.1111/epi.18113.
10
IICBU 2008: a proposed benchmark suite for biological image analysis.IICBU 2008:一个用于生物图像分析的提议基准套件。
Med Biol Eng Comput. 2008 Sep;46(9):943-7. doi: 10.1007/s11517-008-0380-5. Epub 2008 Jul 31.

引用本文的文献

1
TPOT-NN: augmenting tree-based automated machine learning with neural network estimators.TPOT-NN:使用神经网络估计器增强基于树的自动化机器学习
Genet Program Evolvable Mach. 2021 Jun;22(2):207-227. doi: 10.1007/s10710-021-09401-z. Epub 2021 Mar 2.
2
SRBench++ : principled benchmarking of symbolic regression with domain-expert interpretation.SRBench++:基于领域专家解释的符号回归原则性基准测试。
IEEE Trans Evol Comput. 2025 Aug;29(4):1127-1134. doi: 10.1109/tevc.2024.3423681. Epub 2024 Jul 4.
3
OpenClustered: an R package with a benchmark suite of clustered datasets for methodological evaluation and comparison.

本文引用的文献

1
Detecting gene-gene interactions using a permutation-based random forest method.使用基于排列的随机森林方法检测基因-基因相互作用。
BioData Min. 2016 Apr 6;9:14. doi: 10.1186/s13040-016-0093-5. eCollection 2016.
2
ExSTraCS 2.0: Description and Evaluation of a Scalable Learning Classifier System.ExSTraCS 2.0:一种可扩展学习分类器系统的描述与评估
Evol Intell. 2015 Sep;8(2):89-116. doi: 10.1007/s12065-015-0128-8. Epub 2015 Apr 3.
3
GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures.
OpenClustered:一个R软件包,带有用于方法学评估和比较的聚类数据集基准测试套件。
BMC Med Res Methodol. 2025 Apr 10;25(1):92. doi: 10.1186/s12874-025-02548-8.
4
Can a Transparent Machine Learning Algorithm Predict Better than Its Black Box Counterparts? A Benchmarking Study Using 110 Data Sets.一个透明的机器学习算法能比其黑箱对应算法预测得更好吗?一项使用110个数据集的基准研究。
Entropy (Basel). 2024 Aug 31;26(9):746. doi: 10.3390/e26090746.
5
Contemporary Symbolic Regression Methods and their Relative Performance.当代符号回归方法及其相对性能。
Adv Neural Inf Process Syst. 2021 Dec;2021(DB1):1-16.
6
Artificial Intelligence Applications for Osteoporosis Classification Using Computed Tomography.使用计算机断层扫描进行骨质疏松症分类的人工智能应用
Bioengineering (Basel). 2023 Nov 27;10(12):1364. doi: 10.3390/bioengineering10121364.
7
MLpronto: A tool for democratizing machine learning.MLpronto:一个实现机器学习民主化的工具。
PLoS One. 2023 Nov 30;18(11):e0294924. doi: 10.1371/journal.pone.0294924. eCollection 2023.
8
Comparative performances of machine learning algorithms in radiomics and impacting factors.机器学习算法在放射组学中的比较性能及影响因素。
Sci Rep. 2023 Aug 28;13(1):14069. doi: 10.1038/s41598-023-39738-7.
9
A flexible symbolic regression method for constructing interpretable clinical prediction models.一种用于构建可解释临床预测模型的灵活符号回归方法。
NPJ Digit Med. 2023 Jun 5;6(1):107. doi: 10.1038/s41746-023-00833-8.
10
Interpretable decision trees through MaxSAT.通过最大可满足性实现可解释决策树
Artif Intell Rev. 2022 Dec 27:1-21. doi: 10.1007/s10462-022-10377-0.
配子:一种快速、直接的算法,用于生成具有随机结构的纯、严格、上位性模型。
BioData Min. 2012 Oct 1;5(1):16. doi: 10.1186/1756-0381-5-16.
4
Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection.预测纯、严格、上位模型的难度:模拟模型选择的指标。
BioData Min. 2012 Sep 26;5(1):15. doi: 10.1186/1756-0381-5-15.
5
Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition.人与计算机的较量:用于交通标志识别的机器学习算法基准测试。
Neural Netw. 2012 Aug;32:323-32. doi: 10.1016/j.neunet.2012.02.016. Epub 2012 Feb 20.
6
A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction.一种使用多因素降维方法在不平衡数据集中进行上位性建模的平衡准确率函数。
Genet Epidemiol. 2007 May;31(4):306-15. doi: 10.1002/gepi.20211.
7
A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility.一种灵活的计算框架,用于在人类疾病易感性的遗传研究中检测、表征和解释上位性的统计模式。
J Theor Biol. 2006 Jul 21;241(2):252-61. doi: 10.1016/j.jtbi.2005.11.036. Epub 2006 Feb 2.
8
Spectral biclustering of microarray data: coclustering genes and conditions.微阵列数据的谱双聚类:共聚类基因与条件
Genome Res. 2003 Apr;13(4):703-16. doi: 10.1101/gr.648603.