Identification of optimal policies in Markov decision processes

Karel Sladký

Identification of optimal policies in Markov decision processes

Karel Sladký

Kybernetika (2010)

Volume: 46, Issue: 3, page 558-570
ISSN: 0023-5954

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

In this note we focus attention on identifying optimal policies and on elimination suboptimal policies minimizing optimality criteria in discrete-time Markov decision processes with finite state space and compact action set. We present unified approach to value iteration algorithms that enables to generate lower and upper bounds on optimal values, as well as on the current policy. Using the modified value iterations it is possible to eliminate suboptimal actions and to identify an optimal policy or nearly optimal policies in a finite number of steps without knowing precise values of the performance function.

How to cite

top

MLA
BibTeX
RIS

Sladký, Karel. "Identification of optimal policies in Markov decision processes." Kybernetika 46.3 (2010): 558-570. <http://eudml.org/doc/196935>.

@article{Sladký2010,
abstract = {In this note we focus attention on identifying optimal policies and on elimination suboptimal policies minimizing optimality criteria in discrete-time Markov decision processes with finite state space and compact action set. We present unified approach to value iteration algorithms that enables to generate lower and upper bounds on optimal values, as well as on the current policy. Using the modified value iterations it is possible to eliminate suboptimal actions and to identify an optimal policy or nearly optimal policies in a finite number of steps without knowing precise values of the performance function.},
author = {Sladký, Karel},
journal = {Kybernetika},
keywords = {finite state Markov decision processes; discounted and average costs; elimination of suboptimal policies; finite state Markov decision processes; discounted and average costs; elimination of suboptimal policies},
language = {eng},
number = {3},
pages = {558-570},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Identification of optimal policies in Markov decision processes},
url = {http://eudml.org/doc/196935},
volume = {46},
year = {2010},
}

TY - JOUR
AU - Sladký, Karel
TI - Identification of optimal policies in Markov decision processes
JO - Kybernetika
PY - 2010
PB - Institute of Information Theory and Automation AS CR
VL - 46
IS - 3
SP - 558
EP - 570
AB - In this note we focus attention on identifying optimal policies and on elimination suboptimal policies minimizing optimality criteria in discrete-time Markov decision processes with finite state space and compact action set. We present unified approach to value iteration algorithms that enables to generate lower and upper bounds on optimal values, as well as on the current policy. Using the modified value iterations it is possible to eliminate suboptimal actions and to identify an optimal policy or nearly optimal policies in a finite number of steps without knowing precise values of the performance function.
LA - eng
KW - finite state Markov decision processes; discounted and average costs; elimination of suboptimal policies; finite state Markov decision processes; discounted and average costs; elimination of suboptimal policies
UR - http://eudml.org/doc/196935
ER -

References

top

Cruz-Suárez, D., Montes-de-Oca, R., Uniform convergence of the value iteration policies for discounted Markov decision processes, Bol. de la Soc. Mat. Mexicana 12 (2006), 133–148. MR2301750
Cruz-Suárez, D., Montes-de-Oca, R., Salem-Silva, F., Uniform approximations of discounted Markov decision processes to optimal policies, Proceedings of Prague Stochastics 2006 (M. Hušková and M. Janžura, eds.), Matfyzpress, Prague 2006, pp. 278–287.
Grinold, J., 10.1287/opre.21.3.848, Oper. Res. 21 (1973), 848–851. Zbl0259.90053 MR0359828 DOI10.1287/opre.21.3.848
Hastings, N. A. J., 10.1287/opre.19.1.240, Oper. Res. 19 (1971), 240–243. DOI10.1287/opre.19.1.240
Hastings, N. A. J., Mello, J., 10.1287/mnsc.19.9.1019, Manag. Sci. 19 (1971), 1019–1022. Zbl0304.90116 MR0334959 DOI10.1287/mnsc.19.9.1019
Hastings, N. A. J., Mello, J., 10.1287/mnsc.23.1.87, Manag. Sci. 23 (1976), 87–91. MR0439034 DOI10.1287/mnsc.23.1.87
MacQueen, J., 10.1016/0022-247X(66)90060-6, J. Math. Anal. Appl. 14 (1966), 38–43. MR0191707 DOI10.1016/0022-247X(66)90060-6
MacQueen, J., 10.1287/opre.15.3.559, Oper. Res. 15 (1967), 559–561. DOI10.1287/opre.15.3.559
Odoni, A. R., 10.1287/opre.17.5.857, Oper. Res. 17 (1969), 857–860. Zbl0184.23202 MR0284192 DOI10.1287/opre.17.5.857
Puterman, M. L., Shin, M. C., 10.1287/mnsc.24.11.1127, Manag. Sci. 24 (1978), 1127–1137. Zbl0391.90093 MR0521502 DOI10.1287/mnsc.24.11.1127
Puterman, M. L., Shin, M. C., 10.1287/opre.30.2.301, Oper. Res. 30 (1982), 301–318. MR0653253 DOI10.1287/opre.30.2.301
Puterman, M. L., Markov Decision Processes – Discrete Stochastic Dynamic Programming, Wiley, New York 1994. Zbl1184.90170 MR1270015
Sladký, K., O metodě postupných aproximací pro nalezení optimálního řízení markovského řetězce (On successive approximation method for finding optimal control of a Markov chain), Kybernetika 4 (1969), 2, 167–176.
White, D. J., 10.1016/0022-247X(63)90017-9, J. Math. Anal. Appl. 6 (1963), 296–306. MR0148480 DOI10.1016/0022-247X(63)90017-9

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.