Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion

Evgueni I. Gordienko; J. Adolfo Minjárez-Sosa

Displaying similar documents to “Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion”

Approximation and estimation in Markov control processes under a discounted criterion

J. Adolfo Minjárez-Sosa (2004)

Kybernetika

Similarity:

We consider a class of discrete-time Markov control processes with Borel state and action spaces, and $ℜ^{k}$ -valued i.i.d. disturbances with unknown density $ρ .$ Supposing possibly unbounded costs, we combine suitable density estimation methods of $ρ$ with approximation procedures of the optimal cost function, to show the existence of a sequence ${{\hat{f}}_{t}}$ of minimizers converging to an optimal stationary policy $f_{\infty} .$

Approximation and adaptive control of Markov processes: Average reward criterion

Onésimo Hernández-Lerma (1987)

Kybernetika

Similarity:

Estimates of stability of Markov control processes with unbounded costs

Evgueni I. Gordienko, Francisco Salem-Silva (2000)

Kybernetika

Similarity:

For a discrete-time Markov control process with the transition probability $p$ , we compare the total discounted costs $V_{β}$ $(π_{β})$ and $V_{β} ({\tilde{π}}_{β})$ , when applying the optimal control policy $π_{β}$ and its approximation ${\tilde{π}}_{β}$ . The policy ${\tilde{π}}_{β}$ is optimal for an approximating process with the transition probability $\tilde{p}$ . A cost per stage for considered processes can be unbounded. Under certain ergodicity assumptions we establish the upper bound for the relative stability index $[V_{β} ({\tilde{π}}_{β}) - V_{β} (π_{β})] / V_{β} (π_{β})$ . This bound does not depend...

Estimation and control in finite Markov decision processes with the average reward criterion

Rolando Cavazos-Cadena, Raúl Montes-de-Oca (2004)

Applicationes Mathematicae

Similarity:

This work concerns Markov decision chains with finite state and action sets. The transition law satisfies the simultaneous Doeblin condition but is unknown to the controller, and the problem of determining an optimal adaptive policy with respect to the average reward criterion is addressed. A subset of policies is identified so that, when the system evolves under a policy in that class, the frequency estimators of the transition law are consistent on an essential set of admissible state-action...

Time-discretization for controlled Markov processes. I. General approximation results

Nico M. van Dijk, Arie Hordijk (1996)

Kybernetika

Similarity:

Ergodic control of partially observed Markov processes with equivalent transition probabilities

Łukasz Stettner (1993)

Applicationes Mathematicae

Similarity:

Optimal control with long run average cost functional of a partially observed Markov process is considered. Under the assumption that the transition probabilities are equivalent, the existence of the solution to the Bellman equation is shown, with the use of which optimal strategies are constructed.