Department of Management, Applied College, Jazan University, Jazan, KSA, Saudi Arabia.
Neural Netw. 2024 Nov;179:106537. doi: 10.1016/j.neunet.2024.106537. Epub 2024 Jul 14.
Portfolio management (PM) is a popular financial process that concerns the occasional reallocation of a particular quantity of capital into a portfolio of assets, with the main aim of maximizing profitability conditioned to a certain level of risk. Given the inherent dynamicity of stock exchanges and development for long-term performance, reinforcement learning (RL) has become a dominating solution for solving the problem of portfolio management in an automated and efficient manner. Nevertheless, the present RL-based PM methods just take into account the variations in prices of portfolio assets and the implications of price variations, while overlooking the significant relationships among different assets in the market, which are extremely valuable for managerial decisions. To close this gap, this paper introduces a novel deep model that combines two subnetworks; one to learn a temporal representation of historical prices using a refined temporal learner, while the other learns the relationships between different stocks in the market using a relation graph learner (RGL). Then, the above learners are integrated into the curriculum RL scheme for formulating the PM as a curriculum Markov Decision Process, in which an adaptive curriculum policy is presented to enable the agent to adaptively minimize risk value and maximize cumulative return. Proof-of-concept experiments are performed on data from three public stock indices (namely S&P500, NYSE, and NASDAQ), and the results demonstrate the efficiency of the proposed framework in improving the portfolio management performance over the competing RL solutions.
投资组合管理 (PM) 是一种流行的金融流程,涉及将特定数量的资本偶尔重新分配到资产组合中,主要目的是在一定风险水平下最大化盈利能力。鉴于股票交易所和长期绩效发展的固有动态性,强化学习 (RL) 已成为一种主导的解决方案,可实现投资组合管理的自动化和高效解决。然而,目前基于 RL 的 PM 方法仅考虑了投资组合资产价格的变化及其对价格变化的影响,而忽略了市场中不同资产之间的重要关系,这些关系对于管理决策非常有价值。为了弥补这一差距,本文提出了一种新的深度模型,该模型结合了两个子网;一个子网使用改进的时间学习者学习历史价格的时间表示,另一个子网使用关系图学习者 (RGL) 学习市场中不同股票之间的关系。然后,将上述学习者集成到课程式 RL 方案中,将 PM 表述为课程马尔可夫决策过程,其中提出了自适应课程策略,使代理能够自适应地最小化风险值并最大化累积回报。在来自三个公共股票指数(即 S&P500、NYSE 和 NASDAQ)的数据上进行了概念验证实验,结果表明,该框架在提高投资组合管理性能方面优于竞争 RL 解决方案。