考虑优先级灵活需求侧的深度强化学习微电网优化策略

Deep Reinforcement Learning Microgrid Optimization Strategy Considering Priority Flexible Demand Side.

作者信息

Sang Jinsong, Sun Hongbin, Kou Lei

机构信息

Changchun Institute of Technology, School of Electrical Engineering, Changchun 130012, China.

National and Local Joint Engineering Research Center for Smart Distribution Network Measurement, Control and Safe Operation Technology, Changchun 130012, China.

出版信息

Sensors (Basel). 2022 Mar 14;22(6):2256. doi: 10.3390/s22062256.

DOI:10.3390/s22062256

PMID:35336427

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8950638/

Abstract

As an efficient way to integrate multiple distributed energy resources (DERs) and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the flexible energy dispatch between the complex demand side and the microgrid. In response to this problem, the overall environment of wind power, thermostatically controlled loads (TCLs), energy storage systems (ESSs), price-responsive loads and the main grid is proposed. Secondly, the centralized control of the microgrid operation is convenient for the control of the reactive power and voltage of the distributed power supply and the adjustment of the grid frequency. However, there is a problem in that the flexible loads aggregate and generate peaks during the electricity price valley. The existing research takes into account the power constraints of the microgrid and fails to ensure a sufficient supply of electric energy for a single flexible load. This paper considers the response priority of each unit component of TCLs and ESSs on the basis of the overall environment operation of the microgrid so as to ensure the power supply of the flexible load of the microgrid and save the power input cost to the greatest extent. Finally, the simulation optimization of the environment can be expressed as a Markov decision process (MDP) process. It combines two stages of offline and online operations in the training process. The addition of multiple threads with the lack of historical data learning leads to low learning efficiency. The asynchronous advantage actor-critic (Memory A3C, M-A3C) with the experience replay pool memory library is added to solve the data correlation and nonstatic distribution problems during training. The multithreaded working feature of M-A3C can efficiently learn the resource priority allocation on the demand side of the microgrid and improve the flexible scheduling of the demand side of the microgrid, which greatly reduces the input cost. Comparison of the researched cost optimization results with the results obtained with the proximal policy optimization (PPO) algorithm reveals that the proposed algorithm has better performance in terms of convergence and optimization economics.

摘要

作为整合多个分布式能源资源（DER）和用户侧的有效方式，微电网主要面临DER的小规模波动性、不确定性、间歇性以及需求侧不确定性等问题。传统微电网形式单一，无法满足复杂需求侧与微电网之间的灵活能源调度。针对这一问题，提出了风电、温控负载（TCL）、储能系统（ESS）、价格响应型负载以及主电网的整体环境。其次，微电网运行的集中控制便于对分布式电源的无功功率和电压进行控制以及对电网频率进行调节。然而，存在一个问题，即灵活负载在电价低谷期聚集并产生峰值。现有研究考虑了微电网的功率约束，但未能确保单个灵活负载有足够的电能供应。本文在微电网整体环境运行的基础上，考虑了TCL和ESS各单元组件的响应优先级，以确保微电网灵活负载的供电，并最大程度节省电力输入成本。最后，环境的模拟优化可表示为马尔可夫决策过程（MDP）。它在训练过程中结合了离线和在线两个阶段的操作。由于缺乏历史数据学习而添加多个线程导致学习效率低下。添加具有经验回放池内存库的异步优势动作评论家（Memory A3C，M-A3C）来解决训练期间的数据相关性和非静态分布问题。M-A3C的多线程工作特性可以有效学习微电网需求侧的资源优先级分配，并改善微电网需求侧的灵活调度，从而大大降低输入成本。将所研究的成本优化结果与近端策略优化（PPO）算法得到的结果进行比较，结果表明所提算法在收敛性和优化经济性方面具有更好的性能。