Suárez-Cetrulo Andrés L, Cervantes Alejandro, Quintana David
Department of Computer Science, Universidad Carlos III de Madrid, Leganés, 28911 Madrid, Spain.
Centre for Applied Data Analytics Research, University College Dublin, D04 V2N9 Dublin, Ireland.
Entropy (Basel). 2019 Jan 1;21(1):25. doi: 10.3390/e21010025.
In recent years, the problem of concept drift has gained importance in the financial domain. The succession of manias, panics and crashes have stressed the non-stationary nature and the likelihood of drastic structural or concept changes in the markets. Traditional systems are unable or slow to adapt to these changes. Ensemble-based systems are widely known for their good results predicting both cyclic and non-stationary data such as stock prices. In this work, we propose RCARF (Recurring Concepts Adaptive Random Forests), an ensemble tree-based online classifier that handles recurring concepts explicitly. The algorithm extends the capabilities of a version of Random Forest for evolving data streams, adding on top a mechanism to store and handle a shared collection of inactive trees, called concept history, which holds memories of the way market operators reacted in similar circumstances. This works in conjunction with a decision strategy that reacts to drift by replacing active trees with the best available alternative: either a previously stored tree from the concept history or a newly trained background tree. Both mechanisms are designed to provide fast reaction times and are thus applicable to high-frequency data. The experimental validation of the algorithm is based on the prediction of price movement directions one second ahead in the SPDR (Standard & Poor's Depositary Receipts) S&P 500 Exchange-Traded Fund. RCARF is benchmarked against other popular methods from the incremental online machine learning literature and is able to achieve competitive results.
近年来,概念漂移问题在金融领域变得愈发重要。狂热、恐慌和崩盘的接连出现凸显了市场的非平稳性以及剧烈结构或概念变化的可能性。传统系统无法适应或难以快速适应这些变化。基于集成的系统以其在预测诸如股票价格等周期性和非平稳数据方面的良好效果而广为人知。在这项工作中,我们提出了RCARF(循环概念自适应随机森林),这是一种基于集成树的在线分类器,它能明确处理循环概念。该算法扩展了用于演化数据流的随机森林版本的功能,在此基础上增加了一种机制,用于存储和处理一组称为概念历史的非活动树的共享集合,其中保存着市场参与者在类似情况下的反应记忆。这与一种决策策略协同工作,该策略通过用最佳可用替代方案替换活动树来应对漂移:要么是来自概念历史的先前存储的树,要么是新训练的背景树。这两种机制都旨在提供快速反应时间,因此适用于高频数据。该算法的实验验证基于对标准普尔存托凭证(SPDR)标准普尔500交易型开放式指数基金中提前一秒的价格变动方向的预测。RCARF与增量在线机器学习文献中的其他流行方法进行了基准测试,并能够取得有竞争力的结果。