• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

数据科学家的预测评估:常见陷阱与最佳实践

Forecast evaluation for data scientists: common pitfalls and best practices.

作者信息

Hewamalage Hansika, Ackermann Klaus, Bergmeir Christoph

机构信息

School of Computer Science & Engineering, University of New South Wales, Sydney, Australia.

SoDa Labs and Department of Econometrics & Business Statistics, Monash Business School, Monash University, Melbourne, Australia.

出版信息

Data Min Knowl Discov. 2023;37(2):788-832. doi: 10.1007/s10618-022-00894-5. Epub 2022 Dec 2.

DOI:10.1007/s10618-022-00894-5
PMID:36504672
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9718476/
Abstract

Recent trends in the Machine Learning (ML) and in particular Deep Learning (DL) domains have demonstrated that with the availability of massive amounts of time series, ML and DL techniques are competitive in time series forecasting. Nevertheless, the different forms of non-stationarities associated with time series challenge the capabilities of data-driven ML models. Furthermore, due to the domain of forecasting being fostered mainly by statisticians and econometricians over the years, the concepts related to forecast evaluation are not the mainstream knowledge among ML researchers. We demonstrate in our work that as a consequence, ML researchers oftentimes adopt flawed evaluation practices which results in spurious conclusions suggesting methods that are not competitive in reality to be seemingly competitive. Therefore, in this work we provide a tutorial-like compilation of the details associated with forecast evaluation. This way, we intend to impart the information associated with forecast evaluation to fit the context of ML, as means of bridging the knowledge gap between traditional methods of forecasting and adopting current state-of-the-art ML techniques.We elaborate the details of the different problematic characteristics of time series such as non-normality and non-stationarities and how they are associated with common pitfalls in forecast evaluation. Best practices in forecast evaluation are outlined with respect to the different steps such as data partitioning, error calculation, statistical testing, and others. Further guidelines are also provided along selecting valid and suitable error measures depending on the specific characteristics of the dataset at hand.

摘要

机器学习(ML)领域,尤其是深度学习(DL)领域的最新趋势表明,随着大量时间序列数据的可得性,ML和DL技术在时间序列预测方面具有竞争力。然而,与时间序列相关的不同形式的非平稳性对数据驱动的ML模型的能力提出了挑战。此外,由于多年来预测领域主要由统计学家和计量经济学家主导,与预测评估相关的概念并非ML研究人员的主流知识。我们在工作中证明,因此,ML研究人员常常采用有缺陷的评估方法,这导致得出虚假结论,即一些在实际中并无竞争力的方法看似具有竞争力。所以,在这项工作中,我们提供了一份类似教程的内容,汇编了与预测评估相关的细节。通过这种方式,我们旨在传授与预测评估相关的信息,使其符合ML的背景,以此弥合传统预测方法与采用当前最先进ML技术之间的知识差距。我们详细阐述了时间序列的不同问题特征,如非正态性和非平稳性,以及它们如何与预测评估中的常见陷阱相关联。针对数据划分、误差计算、统计检验等不同步骤概述了预测评估的最佳实践。还根据手头数据集的特定特征,提供了选择有效且合适的误差度量的进一步指导方针。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/5812dcd2ca6b/10618_2022_894_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/bc1af8e872ec/10618_2022_894_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/75828d4af3d6/10618_2022_894_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/287b3d6b92c9/10618_2022_894_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/70db81de1587/10618_2022_894_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/d49a4bc02198/10618_2022_894_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/425245bb4dcc/10618_2022_894_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/14f42c7b3b39/10618_2022_894_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/dbf41fab4735/10618_2022_894_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/76bf5d9aaf4f/10618_2022_894_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/df54adce05bc/10618_2022_894_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/fcffe4a4a6d4/10618_2022_894_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/4b3af060c6a1/10618_2022_894_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/c7835870a498/10618_2022_894_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/5812dcd2ca6b/10618_2022_894_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/bc1af8e872ec/10618_2022_894_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/75828d4af3d6/10618_2022_894_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/287b3d6b92c9/10618_2022_894_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/70db81de1587/10618_2022_894_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/d49a4bc02198/10618_2022_894_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/425245bb4dcc/10618_2022_894_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/14f42c7b3b39/10618_2022_894_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/dbf41fab4735/10618_2022_894_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/76bf5d9aaf4f/10618_2022_894_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/df54adce05bc/10618_2022_894_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/fcffe4a4a6d4/10618_2022_894_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/4b3af060c6a1/10618_2022_894_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/c7835870a498/10618_2022_894_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0610/9718476/5812dcd2ca6b/10618_2022_894_Fig14_HTML.jpg

相似文献

1
Forecast evaluation for data scientists: common pitfalls and best practices.数据科学家的预测评估:常见陷阱与最佳实践
Data Min Knowl Discov. 2023;37(2):788-832. doi: 10.1007/s10618-022-00894-5. Epub 2022 Dec 2.
2
Increased adoption of best practices in ecological forecasting enables comparisons of forecastability.最佳实践在生态预测中的广泛应用,使得预测能力的比较成为可能。
Ecol Appl. 2022 Mar;32(2):e2500. doi: 10.1002/eap.2500. Epub 2021 Dec 14.
3
Time Series Forecasting of Univariate Agrometeorological Data: A Comparative Performance Evaluation via One-Step and Multi-Step Ahead Forecasting Strategies.单变量农业气象数据的时间序列预测:通过一步和多步超前预测策略的比较性能评估。
Sensors (Basel). 2021 Apr 1;21(7):2430. doi: 10.3390/s21072430.
4
A MATLAB toolbox to fit and forecast growth trajectories using phenomenological growth models: Application to epidemic outbreaks.一个使用现象学增长模型来拟合和预测增长轨迹的MATLAB工具箱:在疫情爆发中的应用。
Res Sq. 2023 Apr 21:rs.3.rs-2724940. doi: 10.21203/rs.3.rs-2724940/v1.
5
A COVID-19 Pandemic Artificial Intelligence-Based System With Deep Learning Forecasting and Automatic Statistical Data Acquisition: Development and Implementation Study.一种基于人工智能的新冠肺炎大流行深度学习预测与自动统计数据采集系统:开发与实施研究
J Med Internet Res. 2021 May 20;23(5):e27806. doi: 10.2196/27806.
6
Analyzing and Forecasting Pediatric Fever Clinic Visits in High Frequency Using Ensemble Time-Series Methods After the COVID-19 Pandemic in Hangzhou, China: Retrospective Study.中国杭州新冠疫情后基于集成时间序列方法的高频儿科发热门诊就诊情况分析与预测:一项回顾性研究
JMIR Med Inform. 2023 Sep 20;11:e45846. doi: 10.2196/45846.
7
Probabilistic Load Forecasting for Building Energy Models.建筑能源模型的概率负荷预测
Sensors (Basel). 2020 Nov 15;20(22):6525. doi: 10.3390/s20226525.
8
Using time series analysis to forecast the health-related quality of life of post-menopausal women with non-metastatic ER+ breast cancer: A tutorial and case study.使用时间序列分析预测非转移性 ER+ 乳腺癌绝经后妇女的健康相关生活质量:教程和案例研究。
Res Social Adm Pharm. 2020 Aug;16(8):1095-1099. doi: 10.1016/j.sapharm.2019.11.009. Epub 2019 Nov 18.
9
Time series forecasting of new cases and new deaths rate for COVID-19 using deep learning methods.使用深度学习方法对COVID-19的新增病例和新增死亡率进行时间序列预测。
Results Phys. 2021 Aug;27:104495. doi: 10.1016/j.rinp.2021.104495. Epub 2021 Jun 26.
10
Forecasting influenza in Hong Kong with Google search queries and statistical model fusion.利用谷歌搜索查询和统计模型融合预测香港的流感情况。
PLoS One. 2017 May 2;12(5):e0176690. doi: 10.1371/journal.pone.0176690. eCollection 2017.

引用本文的文献

1
Feature selection for specific prediction targets at the user level in a district heating network.在区域供热网络中,针对用户层面特定预测目标的特征选择。
Sci Rep. 2025 Aug 14;15(1):29789. doi: 10.1038/s41598-025-15777-0.
2
Application of a Weighted Absolute Percentage Error-Based Method for Calculating the Aggregate Accuracy of Reported Malaria Surveillance Data.一种基于加权绝对百分比误差的方法在计算报告的疟疾监测数据总体准确性中的应用。
Am J Trop Med Hyg. 2025 Apr 22;113(1):37-41. doi: 10.4269/ajtmh.24-0804. Print 2025 Jul 2.
3
Forecasting mental states in schizophrenia using digital phenotyping data.

本文引用的文献

1
Introduction to the M5 forecasting competition Special Issue.M5预测竞赛特刊引言。
Int J Forecast. 2022 Oct-Dec;38(4):1279-1282. doi: 10.1016/j.ijforecast.2022.04.005. Epub 2022 Jun 25.
2
Gated Spiking Neural P Systems for Time Series Forecasting.用于时间序列预测的门控脉冲神经P系统
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):6227-6236. doi: 10.1109/TNNLS.2021.3134792. Epub 2023 Sep 1.
3
Nonpooling Convolutional Neural Network Forecasting for Seasonal Time Series With Trends.非池化卷积神经网络预测具有趋势季节性时间序列。
利用数字表型数据预测精神分裂症的心理状态。
PLOS Digit Health. 2025 Feb 7;4(2):e0000734. doi: 10.1371/journal.pdig.0000734. eCollection 2025 Feb.
4
Ocean wave prediction using Long Short-Term Memory (LSTM) and Extreme Gradient Boosting (XGBoost) in Tuban Regency for fisherman safety.在图班摄政区利用长短期记忆网络(LSTM)和极端梯度提升(XGBoost)进行海浪预测以保障渔民安全。
MethodsX. 2024 Nov 2;13:103031. doi: 10.1016/j.mex.2024.103031. eCollection 2024 Dec.
5
Documenting Trends in Malaria Data Reporting Accuracy Using Routine Data Quality Audits in Zambia, 2015-2022.利用2015 - 2022年赞比亚的常规数据质量审计记录疟疾数据报告准确性的趋势
Am J Trop Med Hyg. 2024 Nov 26;112(2):274-285. doi: 10.4269/ajtmh.24-0429. Print 2025 Feb 5.
6
Avoiding common machine learning pitfalls.避免常见的机器学习陷阱。
Patterns (N Y). 2024 Aug 28;5(10):101046. doi: 10.1016/j.patter.2024.101046. eCollection 2024 Oct 11.
7
Predicting Mood Based on the Social Context Measured Through the Experience Sampling Method, Digital Phenotyping, and Social Networks.基于通过体验采样法、数字表型和社交网络测量的社会情境预测情绪。
Adm Policy Ment Health. 2024 Jul;51(4):455-475. doi: 10.1007/s10488-023-01328-0. Epub 2024 Jan 10.
8
Modeling information diffusion in social media: data-driven observations.社交媒体中的信息传播建模:数据驱动的观察
Front Big Data. 2023 May 17;6:1135191. doi: 10.3389/fdata.2023.1135191. eCollection 2023.
IEEE Trans Neural Netw Learn Syst. 2020 Aug;31(8):2879-2888. doi: 10.1109/TNNLS.2019.2934110. Epub 2019 Sep 4.
4
The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances.伟大的时间序列分类竞赛:对近期算法进展的综述与实验评估
Data Min Knowl Discov. 2017;31(3):606-660. doi: 10.1007/s10618-016-0483-9. Epub 2016 Nov 23.
5
Neural Decomposition of Time-Series Data for Effective Generalization.用于有效泛化的时间序列数据的神经分解
IEEE Trans Neural Netw Learn Syst. 2018 Jul;29(7):2973-2985. doi: 10.1109/TNNLS.2017.2709324. Epub 2017 Jun 22.
6
A new accuracy measure based on bounded relative error for time series forecasting.一种基于有界相对误差的时间序列预测新精度度量。
PLoS One. 2017 Mar 24;12(3):e0174202. doi: 10.1371/journal.pone.0174202. eCollection 2017.