Wu Dingming, Wang Xiaolong, Su Jingyong, Tang Buzhou, Wu Shaocong
The College of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China.
Entropy (Basel). 2020 Oct 15;22(10):1162. doi: 10.3390/e22101162.
Time series prediction has been widely applied to the finance industry in applications such as stock market price and commodity price forecasting. Machine learning methods have been widely used in financial time series prediction in recent years. How to label financial time series data to determine the prediction accuracy of machine learning models and subsequently determine final investment returns is a hot topic. Existing labeling methods of financial time series mainly label data by comparing the current data with those of a short time period in the future. However, financial time series data are typically non-linear with obvious short-term randomness. Therefore, these labeling methods have not captured the continuous trend features of financial time series data, leading to a difference between their labeling results and real market trends. In this paper, a new labeling method called "continuous trend labeling" is proposed to address the above problem. In the feature preprocessing stage, this paper proposed a new method that can avoid the problem of look-ahead bias in traditional data standardization or normalization processes. Then, a detailed logical explanation was given, the definition of continuous trend labeling was proposed and also an automatic labeling algorithm was given to extract the continuous trend features of financial time series data. Experiments on the Shanghai Composite Index and Shenzhen Component Index and some stocks of China showed that our labeling method is a much better state-of-the-art labeling method in terms of classification accuracy and some other classification evaluation metrics. The results of the paper also proved that deep learning models such as LSTM and GRU are more suitable for dealing with the prediction of financial time series data.
时间序列预测已广泛应用于金融行业,如股票市场价格和商品价格预测等应用中。近年来,机器学习方法已广泛用于金融时间序列预测。如何标记金融时间序列数据以确定机器学习模型的预测准确性,进而确定最终投资回报是一个热门话题。现有的金融时间序列标记方法主要是通过将当前数据与未来短时间段的数据进行比较来标记数据。然而,金融时间序列数据通常是非线性的,具有明显的短期随机性。因此,这些标记方法没有捕捉到金融时间序列数据的连续趋势特征,导致其标记结果与实际市场趋势存在差异。本文提出了一种名为“连续趋势标记”的新标记方法来解决上述问题。在特征预处理阶段,本文提出了一种新方法,该方法可以避免传统数据标准化或归一化过程中的前瞻性偏差问题。然后,给出了详细的逻辑解释,提出了连续趋势标记的定义,并给出了一种自动标记算法来提取金融时间序列数据的连续趋势特征。对上证综指、深证成指及中国一些股票的实验表明,我们的标记方法在分类准确率和其他一些分类评估指标方面是一种比现有方法更好的标记方法。本文的结果还证明,如长短期记忆网络(LSTM)和门控循环单元(GRU)等深度学习模型更适合处理金融时间序列数据的预测。