Huo Xiaoguang, Fu Feng
Department of Mathematics, Cornell University, Ithaca, NY 14850, USA.
Department of Mathematics, Dartmouth College, Hanover, NH 03755, USA.
R Soc Open Sci. 2017 Nov 15;4(11):171377. doi: 10.1098/rsos.171377. eCollection 2017 Nov.
Sequential portfolio selection has attracted increasing interest in the machine learning and quantitative finance communities in recent years. As a mathematical framework for reinforcement learning policies, the stochastic multi-armed bandit problem addresses the primary difficulty in sequential decision-making under uncertainty, namely the versus dilemma, and therefore provides a natural connection to portfolio selection. In this paper, we incorporate risk awareness into the classic multi-armed bandit setting and introduce an algorithm to construct portfolio. Through filtering assets based on the topological structure of the financial market and combining the optimal multi-armed bandit policy with the minimization of a coherent risk measure, we achieve a balance between risk and return.
近年来,序贯投资组合选择在机器学习和量化金融领域引起了越来越多的关注。作为强化学习策略的数学框架,随机多臂老虎机问题解决了不确定性下序贯决策中的主要困难,即探索与利用的困境,因此为投资组合选择提供了自然的联系。在本文中,我们将风险意识纳入经典的多臂老虎机框架,并引入一种构建投资组合的算法。通过基于金融市场的拓扑结构对资产进行筛选,并将最优多臂老虎机策略与一致风险度量的最小化相结合,我们实现了风险与回报之间的平衡。