Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation
Kybernetika (2013)
- Volume: 49, Issue: 5, page 705-719
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topFlores-Hernández, Rosa María. "Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation." Kybernetika 49.5 (2013): 705-719. <http://eudml.org/doc/260635>.
@article{Flores2013,
abstract = {In this paper there are considered Markov decision processes (MDPs) that have the discounted cost as the objective function, state and decision spaces that are subsets of the real line but are not necessarily finite or denumerable. The considered MDPs have a cost function that is possibly unbounded, and dynamic independent of the current state. The considered decision sets are possibly non-compact. In the context described, conditions to obtain either an increasing or decreasing optimal stationary policy are provided; these conditions do not require assumptions of convexity. Versions of the policy iteration algorithm (PIA) to approximate increasing or decreasing optimal stationary policies are detailed. An illustrative example is presented. Finally, comments on the monotonicity conditions and the monotone versions of the PIA that are applied to discounted MDPs with rewards are given.},
author = {Flores-Hernández, Rosa María},
journal = {Kybernetika},
keywords = {Markov decision process; total discounted cost; total discounted reward; increasing optimal policy; decreasing optimal policy; policy iteration algorithm; Markov decision process; total discounted cost; total discounted reward; increasing optimal policy; decreasing optimal policy; policy iteration algorithm},
language = {eng},
number = {5},
pages = {705-719},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation},
url = {http://eudml.org/doc/260635},
volume = {49},
year = {2013},
}
TY - JOUR
AU - Flores-Hernández, Rosa María
TI - Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation
JO - Kybernetika
PY - 2013
PB - Institute of Information Theory and Automation AS CR
VL - 49
IS - 5
SP - 705
EP - 719
AB - In this paper there are considered Markov decision processes (MDPs) that have the discounted cost as the objective function, state and decision spaces that are subsets of the real line but are not necessarily finite or denumerable. The considered MDPs have a cost function that is possibly unbounded, and dynamic independent of the current state. The considered decision sets are possibly non-compact. In the context described, conditions to obtain either an increasing or decreasing optimal stationary policy are provided; these conditions do not require assumptions of convexity. Versions of the policy iteration algorithm (PIA) to approximate increasing or decreasing optimal stationary policies are detailed. An illustrative example is presented. Finally, comments on the monotonicity conditions and the monotone versions of the PIA that are applied to discounted MDPs with rewards are given.
LA - eng
KW - Markov decision process; total discounted cost; total discounted reward; increasing optimal policy; decreasing optimal policy; policy iteration algorithm; Markov decision process; total discounted cost; total discounted reward; increasing optimal policy; decreasing optimal policy; policy iteration algorithm
UR - http://eudml.org/doc/260635
ER -
References
top- Assaf, D., 10.2307/1426946, Adv. in Appl. Probab. 10 (1978), 472-490. Zbl0388.49016MR0489919DOI10.2307/1426946
- Bäuerle, N., Rieder, U., Markov Decision Processes with Applications to Finance., Springer-Verlag, Berlin - Heidelberg 2011. Zbl1236.90004MR2808878
- Bertsekas, D. P., Dynamic Programming: Deterministic and Stochastic Models., Prentice Hall, New Jersey 1987. Zbl0649.93001MR0896902
- Cruz-Suárez, D., Montes-de-Oca, R., Salem-Silva, F., 10.1007/s001860400372, Math. Methods Oper. Res. 60 (2004), 415-436. Zbl1104.90053MR2106092DOI10.1007/s001860400372
- Dragut, A., Structured optimal policies for Markov decision processes: lattice programming techniques., In: Wiley Encyclopedia of Operations Research and Management Science (J. J. Cochran, ed.), John Wiley and Sons, 2010, pp. 1-25.
- Duffie, D., Security Markets., Academic Press, San Diego 1988. Zbl0861.90019MR0955269
- Flores-Hernández, R. M., Montes-de-Oca, R., Monotonicity of minimizers in optimization problems with applications to Markov control processes., Kybernetika 43 (2007), 347-368. Zbl1170.90513MR2362724
- Hernández-Lerma, O., Lasserre, J. B., Discrete-Time Markov Control Processes: Basic Optimality Criteria., Springer-Verlag, New York 1996. Zbl0840.93001MR1363487
- Heyman, D. P., Sobel, M. J., Stochastic Models in Operations Research, Vol. II. Stochastic Optimization., McGraw-Hill, New York 1984. Zbl0531.90062
- Jaśkiewicz, A., 10.1016/j.sysconle.2007.06.006, Syst. Control Lett. 56 (2007), 663-668. Zbl1120.49020MR2356450DOI10.1016/j.sysconle.2007.06.006
- Jaśkiewicz, A., Nowak, A. S., 10.1016/j.jmaa.2010.08.073, J. Math. Anal. Appl. 378 (2011), 450-462. Zbl1254.90292MR2773257DOI10.1016/j.jmaa.2010.08.073
- Mendelssohn, R., Sobel, M. J., 10.1016/0022-0531(80)90009-5, J. Econom. Theory 23 (1980), 243-260. Zbl0472.90015DOI10.1016/0022-0531(80)90009-5
- Puterman, M. L., Markov Decision Processes: Discrete Stochastic Dynamic Programming., John Wiley and Sons, New York 1994. Zbl1184.90170MR1270015
- Topkis, D. M., Supermodularity and Complementarity., Princeton University Press, Princeton, New Jersey 1998. MR1614637
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.