Suppr超能文献

基于经验模态分解的时间序列预测中的信息泄漏研究。

Research on information leakage in time series prediction based on empirical mode decomposition.

作者信息

Yang Xinyi, Li Jingyi, Jiang Xuchu

机构信息

School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China.

College of Finance and Statistics, Hunan University, Changsha, 410082, China.

出版信息

Sci Rep. 2024 Nov 16;14(1):28362. doi: 10.1038/s41598-024-80018-9.

Abstract

Time series analysis predicts the future based on existing historical data and has a wide range of applications in finance, economics, meteorology, biology, engineering, and other fields. Although the combination of decomposition techniques and machine learning algorithms can effectively solve the problem of predicting nonstationary sequences, this kind of decomposition-integration-prediction strategy of the prediction method has serious defects. After the decomposition of the division of the training set and the test set, the information of the test set in the process of decomposition of the information leakage ultimately shows a high accuracy of the prediction of the illusionary. This paper proposes three improvement strategies for this type of "information leakage" problem: sliding window decomposition (SW-EMD), single training and multiple decomposition (STMP-EMD), and multiple training and multiple decomposition (MTMP-EMD). They are combined with a bidirectional multiscale temporal convolutional network (MSBTCN), bidirectional long- and short-term memory network (BiLSTM), and attention mechanism (DMAttention), which introduces a dependency matrix based on cosine similarity to be applied to water quality prediction. The experimental results show that the model achieves good performance in the prediction of three water quality indicators (pH, DO and KMnO), and the accuracies of the three models proposed in this paper are improved by 1.958% and 0.853% in terms of the RMSE and MAPE, respectively, compared with those of the mainstream LSTM models. The key contributions of this study include the following: (1) three methods are proposed to improve the class EMD decomposition, which can effectively solve the problem of "information leakage" that exists in the current models via class EMD decomposition; (2) the CEEMDAN-MSBTCN-BiLSTM-DMAttention model structure is innovated by combining improved class EMD decomposition methods; and (3) the three improved decomposition methods proposed in this paper can effectively solve the problem of "information leakage" and optimize the prediction model at the same time. This study provides an effective experimental method for water quality prediction and can effectively address the problem of "overfitting" models via class EMD decompositions during model training and testing.

摘要

时间序列分析基于现有的历史数据预测未来,在金融、经济、气象、生物学、工程学和其他领域有着广泛的应用。尽管分解技术与机器学习算法相结合能够有效解决非平稳序列的预测问题,但这种预测方法的分解-集成-预测策略存在严重缺陷。在划分训练集和测试集进行分解后,测试集在分解过程中的信息泄漏问题最终导致预测呈现出虚假的高准确率。本文针对这类“信息泄漏”问题提出了三种改进策略:滑动窗口分解(SW-EMD)、单训练多分解(STMP-EMD)和多训练多分解(MTMP-EMD)。它们与双向多尺度时间卷积网络(MSBTCN)、双向长短时记忆网络(BiLSTM)以及注意力机制(DMAttention)相结合,该注意力机制引入基于余弦相似度的依赖矩阵应用于水质预测。实验结果表明,该模型在预测三个水质指标(pH值、溶解氧和高锰酸钾)方面取得了良好的性能,与主流的LSTM模型相比,本文提出的三种模型在均方根误差(RMSE)和平均绝对百分比误差(MAPE)方面分别提高了1.958%和0.853%。本研究的关键贡献包括:(1)提出了三种改进类经验模态分解(EMD)的方法,能够有效解决当前模型通过类EMD分解存在的“信息泄漏”问题;(2)通过结合改进的类EMD分解方法创新了CEEMDAN-MSBTCN-BiLSTM-DMAttention模型结构;(3)本文提出的三种改进分解方法能够有效解决“信息泄漏”问题,同时优化预测模型。本研究为水质预测提供了一种有效的实验方法,并且能够在模型训练和测试过程中通过类EMD分解有效解决模型“过拟合”问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11af/11569228/2c58e8daccba/41598_2024_80018_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验