Approximation and adaptive control of Markov processes: Average reward criterion
We consider a class of discrete-time Markov control processes with Borel state and action spaces, and -valued i.i.d. disturbances with unknown density Supposing possibly unbounded costs, we combine suitable density estimation methods of with approximation procedures of the optimal cost function, to show the existence of a sequence of minimizers converging to an optimal stationary policy
The paper deals with a class of discrete-time stochastic control processes under a discounted optimality criterion with random discount rate, and possibly unbounded costs. The state process and the discount process evolve according to the coupled difference equations
We study the limit behavior of certain classes of dependent random sequences (processes) which do not possess the Markov property. Assuming these processes depend on a control parameter we show that the optimization of the control can be reduced to a problem of nonlinear optimization. Under certain hypotheses we establish the stability of such optimization problems.
This paper considers discrete-time Markov control processes on Borel spaces, with possibly unbounded costs, and the long run average cost (AC) criterion. Under appropriate hypotheses on weighted norms for the cost function and the transition law, the existence of solutions to the average cost optimality inequality and the average cost optimality equation are shown, which in turn yield the existence of AC-optimal and AC-canonical policies respectively.
This paper shows the convergence of the value iteration (or successive approximations) algorithm for average cost (AC) Markov control processes on Borel spaces, with possibly unbounded cost, under appropriate hypotheses on weighted norms for the cost function and the transition law. It is also shown that the aforementioned convergence implies strong forms of AC-optimality and the existence of forecast horizons.
We consider semi-Markov control models with Borel state and action spaces, possibly unbounded costs, and holding times with a generalized exponential distribution with unknown mean θ. Assuming that such a distribution does not depend on the state-action pairs, we introduce a Bayesian estimation procedure for θ, which combined with a variant of the vanishing discount factor approach yields average cost optimal policies.
This paper considers Bayesian parameter estimation and an associated adaptive control scheme for controlled Markov chains and diffusions with time-averaged cost. Asymptotic behaviour of the posterior law of the parameter given the observed trajectory is analyzed. This analysis suggests a "cost-biased" estimation scheme and associated self-tuning adaptive control. This is shown to be asymptotically optimal in the almost sure sense.
The theory of partially observable Markov decision processes (POMDPs) is a useful tool for developing various intelligent agents, and learning hierarchical POMDP models is one of the key approaches for building such agents when the environments of the agents are unknown and large. To learn hierarchical models, bottom-up learning methods in which learning takes place in a layer-by-layer manner from the lowest to the highest layer are already extensively used in some research fields such as hidden...
In this article we present a generalization of Markov Decision Processes with discreet time where the immediate rewards in every period are not deterministic but random, with the two first moments of the distribution given.Formulas are developed to calculate the expected value and the variance of the reward of the process, formulas which generalize and partially correct other results. We make some observations about the distribution of rewards for processes with limited or unlimited horizon and...
In this paper we introduce a new modeling paradigm for developing a decision process representation called the Colored Decision Process Petri Net (CDPPN). It extends the Colored Petri Net (CPN) theoretic approach including Markov decision processes. CPNs are used for process representation taking advantage of the formal semantic and the graphical display. A Markov decision process is utilized as a tool for trajectory planning via a utility function. The main point of the CDPPN is its ability to...
This paper focuses on the constrained optimality of discrete-time Markov decision processes (DTMDPs) with state-dependent discount factors, Borel state and compact Borel action spaces, and possibly unbounded costs. By means of the properties of so-called occupation measures of policies and the technique of transforming the original constrained optimality problem of DTMDPs into a convex program one, we prove the existence of an optimal randomized stationary policies under reasonable conditions.
We consider two parallel M/M/1 queues. The server at one of the queues is subject to intermittent breakdowns. By the theory of dynamic programming, we determine a threshold optimal policy which consists to transfer, when it is necessary, the customers that arrive at the first queue towards the second queue in order to minimize an instantaneous cost depending of the two queue lengths.
This paper deals with discrete-time Markov control processes in Borel spaces with unbounded rewards. Under suitable hypotheses, we show that a randomized stationary policy is optimal for a certain expected constrained problem (ECP) if and only if it is optimal for the corresponding pathwise constrained problem (pathwise CP). Moreover, we show that a certain parametric family of unconstrained optimality equations yields convergence properties that lead to an approximation scheme which allows us to...
This paper deals with Markov Control Processes (MCPs) on Euclidean spaces with an infinite horizon and a discounted total cost. Firstly, MCPs which result from the deterministic controlled systems will be analyzed. For such MCPs, conditions that permit to establish the equation known in the literature of Economy as Euler’s Equation (EE) will be given. There will be also presented an example of a Markov Control Process with deterministic controlled system where, to obtain the optimal value function,...