Efficiency of the Stochastic Approximation Method
In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.
The first-principle modeling of a feedwater heater operating in a coal-fired power unit is presented, along with a theoretical discussion concerning its structural simplifications, parameter estimation, and dynamical validation. The model is a part of the component library of modeling environments, called the Virtual Power Plant (VPP). The main purpose of the VPP is simulation of power generation installations intended for early warning diagnostic applications. The model was developed in the Matlab/Simulink...