Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation

Rosa María Flores-Hernández

Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation

Rosa María Flores-Hernández

Kybernetika (2013)

Volume: 49, Issue: 5, page 705-719
ISSN: 0023-5954

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

In this paper there are considered Markov decision processes (MDPs) that have the discounted cost as the objective function, state and decision spaces that are subsets of the real line but are not necessarily finite or denumerable. The considered MDPs have a cost function that is possibly unbounded, and dynamic independent of the current state. The considered decision sets are possibly non-compact. In the context described, conditions to obtain either an increasing or decreasing optimal stationary policy are provided; these conditions do not require assumptions of convexity. Versions of the policy iteration algorithm (PIA) to approximate increasing or decreasing optimal stationary policies are detailed. An illustrative example is presented. Finally, comments on the monotonicity conditions and the monotone versions of the PIA that are applied to discounted MDPs with rewards are given.

How to cite

top

MLA
BibTeX
RIS

Flores-Hernández, Rosa María. "Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation." Kybernetika 49.5 (2013): 705-719. <http://eudml.org/doc/260635>.

@article{Flores2013,
abstract = {In this paper there are considered Markov decision processes (MDPs) that have the discounted cost as the objective function, state and decision spaces that are subsets of the real line but are not necessarily finite or denumerable. The considered MDPs have a cost function that is possibly unbounded, and dynamic independent of the current state. The considered decision sets are possibly non-compact. In the context described, conditions to obtain either an increasing or decreasing optimal stationary policy are provided; these conditions do not require assumptions of convexity. Versions of the policy iteration algorithm (PIA) to approximate increasing or decreasing optimal stationary policies are detailed. An illustrative example is presented. Finally, comments on the monotonicity conditions and the monotone versions of the PIA that are applied to discounted MDPs with rewards are given.},
author = {Flores-Hernández, Rosa María},
journal = {Kybernetika},
keywords = {Markov decision process; total discounted cost; total discounted reward; increasing optimal policy; decreasing optimal policy; policy iteration algorithm; Markov decision process; total discounted cost; total discounted reward; increasing optimal policy; decreasing optimal policy; policy iteration algorithm},
language = {eng},
number = {5},
pages = {705-719},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation},
url = {http://eudml.org/doc/260635},
volume = {49},
year = {2013},
}

TY - JOUR
AU - Flores-Hernández, Rosa María
TI - Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation
JO - Kybernetika
PY - 2013
PB - Institute of Information Theory and Automation AS CR
VL - 49
IS - 5
SP - 705
EP - 719
AB - In this paper there are considered Markov decision processes (MDPs) that have the discounted cost as the objective function, state and decision spaces that are subsets of the real line but are not necessarily finite or denumerable. The considered MDPs have a cost function that is possibly unbounded, and dynamic independent of the current state. The considered decision sets are possibly non-compact. In the context described, conditions to obtain either an increasing or decreasing optimal stationary policy are provided; these conditions do not require assumptions of convexity. Versions of the policy iteration algorithm (PIA) to approximate increasing or decreasing optimal stationary policies are detailed. An illustrative example is presented. Finally, comments on the monotonicity conditions and the monotone versions of the PIA that are applied to discounted MDPs with rewards are given.
LA - eng
KW - Markov decision process; total discounted cost; total discounted reward; increasing optimal policy; decreasing optimal policy; policy iteration algorithm; Markov decision process; total discounted cost; total discounted reward; increasing optimal policy; decreasing optimal policy; policy iteration algorithm
UR - http://eudml.org/doc/260635
ER -

References

top

Assaf, D., 10.2307/1426946, Adv. in Appl. Probab. 10 (1978), 472-490. Zbl0388.49016 MR0489919 DOI10.2307/1426946
Bäuerle, N., Rieder, U., Markov Decision Processes with Applications to Finance., Springer-Verlag, Berlin - Heidelberg 2011. Zbl1236.90004 MR2808878
Bertsekas, D. P., Dynamic Programming: Deterministic and Stochastic Models., Prentice Hall, New Jersey 1987. Zbl0649.93001 MR0896902
Cruz-Suárez, D., Montes-de-Oca, R., Salem-Silva, F., 10.1007/s001860400372, Math. Methods Oper. Res. 60 (2004), 415-436. Zbl1104.90053 MR2106092 DOI10.1007/s001860400372
Dragut, A., Structured optimal policies for Markov decision processes: lattice programming techniques., In: Wiley Encyclopedia of Operations Research and Management Science (J. J. Cochran, ed.), John Wiley and Sons, 2010, pp. 1-25.
Duffie, D., Security Markets., Academic Press, San Diego 1988. Zbl0861.90019 MR0955269
Flores-Hernández, R. M., Montes-de-Oca, R., Monotonicity of minimizers in optimization problems with applications to Markov control processes., Kybernetika 43 (2007), 347-368. Zbl1170.90513 MR2362724
Hernández-Lerma, O., Lasserre, J. B., Discrete-Time Markov Control Processes: Basic Optimality Criteria., Springer-Verlag, New York 1996. Zbl0840.93001 MR1363487
Heyman, D. P., Sobel, M. J., Stochastic Models in Operations Research, Vol. II. Stochastic Optimization., McGraw-Hill, New York 1984. Zbl0531.90062
Jaśkiewicz, A., 10.1016/j.sysconle.2007.06.006, Syst. Control Lett. 56 (2007), 663-668. Zbl1120.49020 MR2356450 DOI10.1016/j.sysconle.2007.06.006
Jaśkiewicz, A., Nowak, A. S., 10.1016/j.jmaa.2010.08.073, J. Math. Anal. Appl. 378 (2011), 450-462. Zbl1254.90292 MR2773257 DOI10.1016/j.jmaa.2010.08.073
Mendelssohn, R., Sobel, M. J., 10.1016/0022-0531(80)90009-5, J. Econom. Theory 23 (1980), 243-260. Zbl0472.90015 DOI10.1016/0022-0531(80)90009-5
Puterman, M. L., Markov Decision Processes: Discrete Stochastic Dynamic Programming., John Wiley and Sons, New York 1994. Zbl1184.90170 MR1270015
Topkis, D. M., Supermodularity and Complementarity., Princeton University Press, Princeton, New Jersey 1998. MR1614637

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.