Song Zitao, Wang Yining, Qian Pin, Song Sifan, Coenen Frans, Jiang Zhengyong, Su Jionglong
Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China.
Department of Computer Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China.
Appl Intell (Dordr). 2023;53(12):15188-15203. doi: 10.1007/s10489-022-04217-5. Epub 2022 Nov 11.
As a fundamental problem in algorithmic trading, portfolio optimization aims to maximize the cumulative return by continuously investing in various financial derivatives within a given time period. Recent years have witnessed the transformation from traditional machine learning trading algorithms to reinforcement learning algorithms due to their superior nature of sequential decision making. However, the exponential growth of the imperfect and noisy financial data that is supposedly leveraged by the deterministic strategy in reinforcement learning, makes it increasingly challenging for one to continuously obtain a profitable portfolio. Thus, in this work, we first reconstruct several deterministic and stochastic reinforcement algorithms as benchmarks. On this basis, we introduce a risk-aware reward function to balance the risk and return. Importantly, we propose a novel interpretable stochastic reinforcement learning framework which tailors a stochastic policy parameterized by Gaussian Mixtures and a distributional critic realized by quantiles for the problem of portfolio optimization. In our experiment, the proposed algorithm demonstrates its superior performance on U.S. market stocks with a 63.1% annual rate of return while at the same time reducing the market value max drawdown by 10% when back-testing during the stock market crash around March 2020.
作为算法交易中的一个基本问题,投资组合优化旨在通过在给定时间段内持续投资于各种金融衍生品来最大化累积回报。近年来,由于强化学习算法在序列决策方面具有优越性,出现了从传统机器学习交易算法向强化学习算法的转变。然而,强化学习中确定性策略所利用的不完美且有噪声的金融数据呈指数级增长,使得人们越来越难以持续获得盈利的投资组合。因此,在这项工作中,我们首先重构了几种确定性和随机强化算法作为基准。在此基础上,我们引入了一个风险感知奖励函数来平衡风险和回报。重要的是,我们提出了一种新颖的可解释随机强化学习框架,该框架针对投资组合优化问题,定制了一个由高斯混合参数化的随机策略和一个由分位数实现的分布评论家。在我们的实验中,所提出的算法在对美国市场股票进行回测时,展示了其卓越的性能,年回报率为63.1%,同时在2020年3月左右股市暴跌期间,将市值最大回撤降低了10%。