Estimates for perturbations of average Markov decision processes with a minimal state and upper bounded by stochastically ordered Markov chains

Raúl Montes-de-Oca; Francisco Salem-Silva

Estimates for perturbations of average Markov decision processes with a minimal state and upper bounded by stochastically ordered Markov chains

Raúl Montes-de-Oca; Francisco Salem-Silva

Kybernetika (2005)

Volume: 41, Issue: 6, page [757]-772
ISSN: 0023-5954

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

This paper deals with Markov decision processes (MDPs) with real state space for which its minimum is attained, and that are upper bounded by (uncontrolled) stochastically ordered (SO) Markov chains. We consider MDPs with (possibly) unbounded costs, and to evaluate the quality of each policy, we use the objective function known as the average cost. For this objective function we consider two Markov control models

ℙ

and

ℙ_{1}

.

ℙ

and

ℙ_{1}

have the same components except for the transition laws. The transition

q

of

ℙ

is taken as unknown, and the transition

q_{1}

of

ℙ_{1}

, as a known approximation of

q

. Under certain irreducibility, recurrence and ergodic conditions imposed on the bounding SO Markov chain (these conditions give the rate of convergence of the transition probability in

t

-steps,

t = 1, 2, ...

to the invariant measure), the difference between the optimal cost to drive

ℙ

and the cost obtained to drive

ℙ

using the optimal policy of

ℙ_{1}

is estimated. That difference is defined as the index of perturbations, and in this work upper bounds of it are provided. An example to illustrate the theory developed here is added.

How to cite

top

MLA
BibTeX
RIS

Montes-de-Oca, Raúl, and Salem-Silva, Francisco. "Estimates for perturbations of average Markov decision processes with a minimal state and upper bounded by stochastically ordered Markov chains." Kybernetika 41.6 (2005): [757]-772. <http://eudml.org/doc/33786>.

@article{Montes2005,
abstract = {This paper deals with Markov decision processes (MDPs) with real state space for which its minimum is attained, and that are upper bounded by (uncontrolled) stochastically ordered (SO) Markov chains. We consider MDPs with (possibly) unbounded costs, and to evaluate the quality of each policy, we use the objective function known as the average cost. For this objective function we consider two Markov control models $\{\mathbb \{P\}\}$ and $\{\mathbb \{P\}\}_\{1\}$. $\mathbb \{P\}$ and $\{\mathbb \{P\}\}_\{1\}$ have the same components except for the transition laws. The transition $q$ of $\mathbb \{P\}$ is taken as unknown, and the transition $q_\{1\}$ of $\{\mathbb \{P\}\}_\{1\}$, as a known approximation of $q$. Under certain irreducibility, recurrence and ergodic conditions imposed on the bounding SO Markov chain (these conditions give the rate of convergence of the transition probability in $t$-steps, $t=1,2,\ldots $ to the invariant measure), the difference between the optimal cost to drive $\mathbb \{P\}$ and the cost obtained to drive $\mathbb \{P\}$ using the optimal policy of $\{\mathbb \{P\}\}_\{1\}$ is estimated. That difference is defined as the index of perturbations, and in this work upper bounds of it are provided. An example to illustrate the theory developed here is added.},
author = {Montes-de-Oca, Raúl, Salem-Silva, Francisco},
journal = {Kybernetika},
keywords = {stochastically ordered Markov chains; Lyapunov condition; invariant probability; average Markov decision processes; stochastically ordered Markov chains; Lyapunov condition; invariant probability},
language = {eng},
number = {6},
pages = {[757]-772},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Estimates for perturbations of average Markov decision processes with a minimal state and upper bounded by stochastically ordered Markov chains},
url = {http://eudml.org/doc/33786},
volume = {41},
year = {2005},
}

TY - JOUR
AU - Montes-de-Oca, Raúl
AU - Salem-Silva, Francisco
TI - Estimates for perturbations of average Markov decision processes with a minimal state and upper bounded by stochastically ordered Markov chains
JO - Kybernetika
PY - 2005
PB - Institute of Information Theory and Automation AS CR
VL - 41
IS - 6
SP - [757]
EP - 772
AB - This paper deals with Markov decision processes (MDPs) with real state space for which its minimum is attained, and that are upper bounded by (uncontrolled) stochastically ordered (SO) Markov chains. We consider MDPs with (possibly) unbounded costs, and to evaluate the quality of each policy, we use the objective function known as the average cost. For this objective function we consider two Markov control models ${\mathbb {P}}$ and ${\mathbb {P}}_{1}$. $\mathbb {P}$ and ${\mathbb {P}}_{1}$ have the same components except for the transition laws. The transition $q$ of $\mathbb {P}$ is taken as unknown, and the transition $q_{1}$ of ${\mathbb {P}}_{1}$, as a known approximation of $q$. Under certain irreducibility, recurrence and ergodic conditions imposed on the bounding SO Markov chain (these conditions give the rate of convergence of the transition probability in $t$-steps, $t=1,2,\ldots $ to the invariant measure), the difference between the optimal cost to drive $\mathbb {P}$ and the cost obtained to drive $\mathbb {P}$ using the optimal policy of ${\mathbb {P}}_{1}$ is estimated. That difference is defined as the index of perturbations, and in this work upper bounds of it are provided. An example to illustrate the theory developed here is added.
LA - eng
KW - stochastically ordered Markov chains; Lyapunov condition; invariant probability; average Markov decision processes; stochastically ordered Markov chains; Lyapunov condition; invariant probability
UR - http://eudml.org/doc/33786
ER -

References

top

Favero F., Runglandier W. J., 10.1016/S0167-6911(02)00121-4, Systems Control Lett. 46 (2002), 91–97 MR2010062 DOI10.1016/S0167-6911(02)00121-4
Gordienko E. I., 10.1007/BF01099115, J. Soviet Math. 50 (1992), 891–899 (1992) MR1163393 DOI10.1007/BF01099115
Gordienko E. I., Lecture Notes on Stability Estimation in Markov Decision Processes, Universidad Autónoma Metropolitana, México D.F., 1994
Gordienko E. I., Hernández-Lerma O., Average cost Markov control processes with weighted norms: value iteration, Appl. Math. 23 (1995), 219–237 (1995) Zbl0829.93068 MR1341224
Gordienko E. I., Salem-Silva F. S., 10.1016/S0167-6911(97)00077-7, Systems Control Lett. 33 (1998), 125–130 (1998) MR1607814 DOI10.1016/S0167-6911(97)00077-7
Gordienko E. I., Salem-Silva F. S., Estimates of stability of Markov control processes with unbounded costs, Kybernetika 36 (2000), 2, 195–210 MR1760024
Hernández-Lerma O., Adaptive Markov Control Processes, Springer–Verlag, New York 1989 MR0995463
Hernández-Lerma O., Lasserre J. B., Further Topics on Discrete-Time Markov Control Processes, Springer–Verlag, New York 1999 Zbl0928.93002 MR1697198
Hinderer K., Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter, (Lectures Notes in Operations Research and Mathematical Systems 33.) Springer–Verlag, Berlin – Heidelberg – New York 1970 Zbl0202.18401 MR0267890
Lindvall T., Lectures on the Coupling Method, (Wiley Series in Probability and Mathematical Statistics.) Wiley, New York 1992 Zbl1013.60001 MR1180522
Lund R., 10.2307/3215107, J. Appl. Probab. 34 (1997), 806–811 (1997) MR1464616 DOI10.2307/3215107
Lund R., Tweedie R., 10.1287/moor.21.1.182, Math. Oper. Res. 20 (1996), 182–194 (1996) Zbl0847.60053 MR1385873 DOI10.1287/moor.21.1.182
Meyn S., Tweedie R., Markov Chains and Stochastic Stability, Springer–Verlag, New York 1993 Zbl1165.60001 MR1287609
Montes-de-Oca R., Sakhanenko, A., Salem-Silva F., Estimates for perturbations of general discounted Markov control chains, Appl. Math. 30 (2003), 3, 287–304 Zbl1055.90086 MR2029538
Nummelin E., General Irreducible Markov Chains and Non-negative Operators, Cambrigde University Press, Cambridge 1984 Zbl0551.60066 MR0776608
Rachev S. T., Probability Metrics and the Stability of Stochastic Models, Wiley, New York 1991 Zbl0744.60004 MR1105086
Zolotarev V. M., On stochastic continuity of queueing systems of type G/G/1, Theory Probab. Appl. 21 (1976), 250–269 (1976) Zbl0363.60090 MR0420920

Citations in EuDML Documents

top

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.