Chen Ying, Zhao Jingjing, Qin Jiancheng, Li Hua, Zhang Zili
School of Economics and Management, Harbin Institute of Technology, Harbin 150001, China.
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001,China.
Fundam Res. 2021 Oct 27;3(3):392-402. doi: 10.1016/j.fmre.2021.09.011. eCollection 2023 May.
Numerical weather prediction (NWP) data possess internal inaccuracies, such as low NWP wind speed corresponding to high actual wind power generation. This study is intended to reduce the negative effects of such inaccuracies by proposing a pure data-selection framework (PDF) to choose useful data prior to modeling, thus improving the accuracy of day-ahead wind power forecasting. Briefly, we convert an entire NWP training dataset into many small subsets and then select the best subset combination via a validation set to build a forecasting model. Although a small subset can increase selection flexibility, it can also produce billions of subset combinations, resulting in computational issues. To address this problem, we incorporated metamodeling and optimization steps into PDF. We then proposed a design and analysis of the computer experiments-based metamodeling algorithm and heuristic-exhaustive search optimization algorithm, respectively.Experimental results demonstrate that (1) it is necessary to select data before constructing a forecasting model; (2) using a smaller subset will likely increase selection flexibility, leading to a more accurate forecasting model; (3) PDF can generate a better training dataset than similarity-based data selection methods (e.g., -means and support vector classification); and (4) choosing data before building a forecasting model produces a more accurate forecasting model compared with using a machine learning method to construct a model directly.
数值天气预报(NWP)数据存在内在的不准确性,例如NWP风速较低却对应着较高的实际风力发电量。本研究旨在通过提出一种纯数据选择框架(PDF)来减少此类不准确性的负面影响,即在建模之前选择有用的数据,从而提高日前风力发电预测的准确性。简而言之,我们将整个NWP训练数据集转换为许多小子集,然后通过验证集选择最佳子集组合来构建预测模型。虽然小子集可以增加选择的灵活性,但它也会产生数十亿个子集组合,从而导致计算问题。为了解决这个问题,我们将元建模和优化步骤纳入PDF。然后,我们分别提出了基于计算机实验设计与分析的元建模算法和启发式穷举搜索优化算法。实验结果表明:(1)在构建预测模型之前选择数据是必要的;(2)使用较小的子集可能会增加选择的灵活性,从而得到更准确的预测模型;(3)与基于相似性的数据选择方法(例如K均值和支持向量分类)相比,PDF可以生成更好的训练数据集;(4)与直接使用机器学习方法构建模型相比,在构建预测模型之前选择数据可以产生更准确的预测模型。