基于机器学习模型的股票指数现货-期货套利预测

Stock Index Spot-Futures Arbitrage Prediction Using Machine Learning Models.

作者信息

Sheng Yankai, Ma Ding

机构信息

School of Economics, Wuhan University of Technology, Wuhan 430070, China.

出版信息

Entropy (Basel). 2022 Oct 13;24(10):1462. doi: 10.3390/e24101462.

DOI:10.3390/e24101462

PMID:37420482

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9601484/

Abstract

With the development of quantitative finance, machine learning methods used in the financial fields have been given significant attention among researchers, investors, and traders. However, in the field of stock index spot-futures arbitrage, relevant work is still rare. Furthermore, existing work is mostly retrospective, rather than anticipatory of arbitrage opportunities. To close the gap, this study uses machine learning approaches based on historical high-frequency data to forecast spot-futures arbitrage opportunities for the China Security Index (CSI) 300. Firstly, the possibility of spot-futures arbitrage opportunities is identified through econometric models. Then, Exchange-Traded-Fund (ETF)-based portfolios are built to fit the movements of CSI 300 with the least tracking errors. A strategy consisting of non-arbitrage intervals and unwinding timing indicators is derived and proven profitable in a back-test. In forecasting, four machine learning methods are adopted to predict the indicator we acquired, namely Least Absolute Shrinkage and Selection Operator (LASSO), Extreme Gradient Boosting (XGBoost), Back Propagation Neural Network (BPNN), and Long Short-Term Memory neural network (LSTM). The performance of each algorithm is compared from two perspectives. One is an error perspective based on the Root-Mean-Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and goodness of fit (R). Another is a return perspective based on the trade yield and the number of arbitrage opportunities captured. Finally, a performance heterogeneity analysis is conducted based on the separation of bull and bear markets. The results show that LSTM outperforms all other algorithms over the entire time period, with an RMSE of 0.00813, MAPE of 0.70 percent, R of 92.09 percent, and an arbitrage return of 58.18 percent. Meanwhile, in certain market conditions, namely both the bull market and bear market separately with a shorter period, LASSO can outperform.

摘要

随着量化金融的发展，金融领域中使用的机器学习方法受到了研究人员、投资者和交易员的广泛关注。然而，在股指期货套利领域，相关工作仍然很少。此外，现有工作大多是回顾性的，而非对套利机会的前瞻性研究。为了弥补这一差距，本研究使用基于历史高频数据的机器学习方法来预测沪深300指数的期现套利机会。首先，通过计量模型识别期现套利机会的可能性。然后，构建基于交易型开放式指数基金（ETF）的投资组合，以最小的跟踪误差拟合沪深300指数的走势。推导并在回测中证明了一种由无套利区间和解仓时机指标组成的策略是盈利的。在预测方面，采用四种机器学习方法来预测我们获取的指标，即最小绝对收缩与选择算子（LASSO）、极端梯度提升（XGBoost）、反向传播神经网络（BPNN）和长短期记忆神经网络（LSTM）。从两个角度比较了每种算法的性能。一个是基于均方根误差（RMSE）、平均绝对百分比误差（MAPE）和拟合优度（R）的误差角度。另一个是基于交易收益率和捕获的套利机会数量的收益角度。最后，基于牛市和熊市的划分进行了性能异质性分析。结果表明，在整个时间段内，LSTM的表现优于所有其他算法，RMSE为0.00813，MAPE为0.70%，R为92.09%，套利回报率为58.18%。同时，在某些市场条件下，即在较短时期的牛市和熊市中，LASSO的表现可以更优。