Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation

Rosa María Flores-Hernández

Kybernetika (2013)

  • Volume: 49, Issue: 5, page 705-719
  • ISSN: 0023-5954

Abstract

top
In this paper there are considered Markov decision processes (MDPs) that have the discounted cost as the objective function, state and decision spaces that are subsets of the real line but are not necessarily finite or denumerable. The considered MDPs have a cost function that is possibly unbounded, and dynamic independent of the current state. The considered decision sets are possibly non-compact. In the context described, conditions to obtain either an increasing or decreasing optimal stationary policy are provided; these conditions do not require assumptions of convexity. Versions of the policy iteration algorithm (PIA) to approximate increasing or decreasing optimal stationary policies are detailed. An illustrative example is presented. Finally, comments on the monotonicity conditions and the monotone versions of the PIA that are applied to discounted MDPs with rewards are given.

How to cite

top

Flores-Hernández, Rosa María. "Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation." Kybernetika 49.5 (2013): 705-719. <http://eudml.org/doc/260635>.

@article{Flores2013,
abstract = {In this paper there are considered Markov decision processes (MDPs) that have the discounted cost as the objective function, state and decision spaces that are subsets of the real line but are not necessarily finite or denumerable. The considered MDPs have a cost function that is possibly unbounded, and dynamic independent of the current state. The considered decision sets are possibly non-compact. In the context described, conditions to obtain either an increasing or decreasing optimal stationary policy are provided; these conditions do not require assumptions of convexity. Versions of the policy iteration algorithm (PIA) to approximate increasing or decreasing optimal stationary policies are detailed. An illustrative example is presented. Finally, comments on the monotonicity conditions and the monotone versions of the PIA that are applied to discounted MDPs with rewards are given.},
author = {Flores-Hernández, Rosa María},
journal = {Kybernetika},
keywords = {Markov decision process; total discounted cost; total discounted reward; increasing optimal policy; decreasing optimal policy; policy iteration algorithm; Markov decision process; total discounted cost; total discounted reward; increasing optimal policy; decreasing optimal policy; policy iteration algorithm},
language = {eng},
number = {5},
pages = {705-719},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation},
url = {http://eudml.org/doc/260635},
volume = {49},
year = {2013},
}

TY - JOUR
AU - Flores-Hernández, Rosa María
TI - Monotone optimal policies in discounted Markov decision processes with transition probabilities independent of the current state: existence and approximation
JO - Kybernetika
PY - 2013
PB - Institute of Information Theory and Automation AS CR
VL - 49
IS - 5
SP - 705
EP - 719
AB - In this paper there are considered Markov decision processes (MDPs) that have the discounted cost as the objective function, state and decision spaces that are subsets of the real line but are not necessarily finite or denumerable. The considered MDPs have a cost function that is possibly unbounded, and dynamic independent of the current state. The considered decision sets are possibly non-compact. In the context described, conditions to obtain either an increasing or decreasing optimal stationary policy are provided; these conditions do not require assumptions of convexity. Versions of the policy iteration algorithm (PIA) to approximate increasing or decreasing optimal stationary policies are detailed. An illustrative example is presented. Finally, comments on the monotonicity conditions and the monotone versions of the PIA that are applied to discounted MDPs with rewards are given.
LA - eng
KW - Markov decision process; total discounted cost; total discounted reward; increasing optimal policy; decreasing optimal policy; policy iteration algorithm; Markov decision process; total discounted cost; total discounted reward; increasing optimal policy; decreasing optimal policy; policy iteration algorithm
UR - http://eudml.org/doc/260635
ER -

References

top
  1. Assaf, D., 10.2307/1426946, Adv. in Appl. Probab. 10 (1978), 472-490. Zbl0388.49016MR0489919DOI10.2307/1426946
  2. Bäuerle, N., Rieder, U., Markov Decision Processes with Applications to Finance., Springer-Verlag, Berlin - Heidelberg 2011. Zbl1236.90004MR2808878
  3. Bertsekas, D. P., Dynamic Programming: Deterministic and Stochastic Models., Prentice Hall, New Jersey 1987. Zbl0649.93001MR0896902
  4. Cruz-Suárez, D., Montes-de-Oca, R., Salem-Silva, F., 10.1007/s001860400372, Math. Methods Oper. Res. 60 (2004), 415-436. Zbl1104.90053MR2106092DOI10.1007/s001860400372
  5. Dragut, A., Structured optimal policies for Markov decision processes: lattice programming techniques., In: Wiley Encyclopedia of Operations Research and Management Science (J. J. Cochran, ed.), John Wiley and Sons, 2010, pp. 1-25. 
  6. Duffie, D., Security Markets., Academic Press, San Diego 1988. Zbl0861.90019MR0955269
  7. Flores-Hernández, R. M., Montes-de-Oca, R., Monotonicity of minimizers in optimization problems with applications to Markov control processes., Kybernetika 43 (2007), 347-368. Zbl1170.90513MR2362724
  8. Hernández-Lerma, O., Lasserre, J. B., Discrete-Time Markov Control Processes: Basic Optimality Criteria., Springer-Verlag, New York 1996. Zbl0840.93001MR1363487
  9. Heyman, D. P., Sobel, M. J., Stochastic Models in Operations Research, Vol. II. Stochastic Optimization., McGraw-Hill, New York 1984. Zbl0531.90062
  10. Jaśkiewicz, A., 10.1016/j.sysconle.2007.06.006, Syst. Control Lett. 56 (2007), 663-668. Zbl1120.49020MR2356450DOI10.1016/j.sysconle.2007.06.006
  11. Jaśkiewicz, A., Nowak, A. S., 10.1016/j.jmaa.2010.08.073, J. Math. Anal. Appl. 378 (2011), 450-462. Zbl1254.90292MR2773257DOI10.1016/j.jmaa.2010.08.073
  12. Mendelssohn, R., Sobel, M. J., 10.1016/0022-0531(80)90009-5, J. Econom. Theory 23 (1980), 243-260. Zbl0472.90015DOI10.1016/0022-0531(80)90009-5
  13. Puterman, M. L., Markov Decision Processes: Discrete Stochastic Dynamic Programming., John Wiley and Sons, New York 1994. Zbl1184.90170MR1270015
  14. Topkis, D. M., Supermodularity and Complementarity., Princeton University Press, Princeton, New Jersey 1998. MR1614637

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.