Second Order optimality in Markov decision chains

Karel Sladký

Kybernetika (2017)

  • Volume: 53, Issue: 6, page 1086-1099
  • ISSN: 0023-5954

Abstract

top
The article is devoted to Markov reward chains in discrete-time setting with finite state spaces. Unfortunately, the usual optimization criteria examined in the literature on Markov decision chains, such as a total discounted, total reward up to reaching some specific state (called the first passage models) or mean (average) reward optimality, may be quite insufficient to characterize the problem from the point of a decision maker. To this end it seems that it may be preferable if not necessary to select more sophisticated criteria that also reflect variability-risk features of the problem. Perhaps the best known approaches stem from the classical work of Markowitz on mean variance selection rules, i. e. we optimize the weighted sum of average or total reward and its variance. The article presents explicit formulae for calculating the variances for transient and discounted models (where the value of the discount factor depends on the current state and action taken) for finite and infinite time horizon. The same result is presented for the long run average nondiscounted models where finding stationary policies minimizing the average variance in the class of policies with a given long run average reward is discussed.

How to cite

top

Sladký, Karel. "Second Order optimality in Markov decision chains." Kybernetika 53.6 (2017): 1086-1099. <http://eudml.org/doc/294847>.

@article{Sladký2017,
abstract = {The article is devoted to Markov reward chains in discrete-time setting with finite state spaces. Unfortunately, the usual optimization criteria examined in the literature on Markov decision chains, such as a total discounted, total reward up to reaching some specific state (called the first passage models) or mean (average) reward optimality, may be quite insufficient to characterize the problem from the point of a decision maker. To this end it seems that it may be preferable if not necessary to select more sophisticated criteria that also reflect variability-risk features of the problem. Perhaps the best known approaches stem from the classical work of Markowitz on mean variance selection rules, i. e. we optimize the weighted sum of average or total reward and its variance. The article presents explicit formulae for calculating the variances for transient and discounted models (where the value of the discount factor depends on the current state and action taken) for finite and infinite time horizon. The same result is presented for the long run average nondiscounted models where finding stationary policies minimizing the average variance in the class of policies with a given long run average reward is discussed.},
author = {Sladký, Karel},
journal = {Kybernetika},
keywords = {Markov decision chains; second order optimality; optimality conditions for transient; discounted and average models; policy iterations; value iterations},
language = {eng},
number = {6},
pages = {1086-1099},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Second Order optimality in Markov decision chains},
url = {http://eudml.org/doc/294847},
volume = {53},
year = {2017},
}

TY - JOUR
AU - Sladký, Karel
TI - Second Order optimality in Markov decision chains
JO - Kybernetika
PY - 2017
PB - Institute of Information Theory and Automation AS CR
VL - 53
IS - 6
SP - 1086
EP - 1099
AB - The article is devoted to Markov reward chains in discrete-time setting with finite state spaces. Unfortunately, the usual optimization criteria examined in the literature on Markov decision chains, such as a total discounted, total reward up to reaching some specific state (called the first passage models) or mean (average) reward optimality, may be quite insufficient to characterize the problem from the point of a decision maker. To this end it seems that it may be preferable if not necessary to select more sophisticated criteria that also reflect variability-risk features of the problem. Perhaps the best known approaches stem from the classical work of Markowitz on mean variance selection rules, i. e. we optimize the weighted sum of average or total reward and its variance. The article presents explicit formulae for calculating the variances for transient and discounted models (where the value of the discount factor depends on the current state and action taken) for finite and infinite time horizon. The same result is presented for the long run average nondiscounted models where finding stationary policies minimizing the average variance in the class of policies with a given long run average reward is discussed.
LA - eng
KW - Markov decision chains; second order optimality; optimality conditions for transient; discounted and average models; policy iterations; value iterations
UR - http://eudml.org/doc/294847
ER -

References

top
  1. Feinberg, E. A., Fei, J., 10.1239/jap/1261670699, J. Appl. Probab. 46 (2009), 1209-1212. MR2582716DOI10.1239/jap/1261670699
  2. Gantmakher, F. R., The Theory of Matrices., Chelsea, London 1959. MR0107649
  3. Jaquette, S. C., , Ann. Statist. 1 (1973), 496-505. MR0378839DOI
  4. Mandl, P., On the variance in controlled Markov chains., Kybernetika 7 (1971), 1-12. Zbl0215.25902MR0286178
  5. Markowitz, H., Portfolio Selection - Efficient Diversification of Investments., Wiley, New York 1959. MR0103768
  6. Puterman, M. L., Markov Decision Processes - Discrete Stochastic Dynamic Programming., Wiley, New York 1994. MR1270015
  7. Bäuerle, N., Rieder, U., Markov Decision Processes with Application to Finance., Springer-Verlag, Berlin 2011. MR2808878
  8. Righter, R., , J. Appl. Probab. 48 (2011), 293-294. MR2809902DOI
  9. Sladký, K., , Math. Meth. Oper. Res. 62 (2005), 387-397. MR2229697DOI
  10. Sladký, K., Risk-sensitive and mean variance optimality in Markov decision processes., Acta Oeconomica Pragensia 7 (2013), 146-161. 
  11. Sladký, K., Second order optimality in transient and discounted Markov decision chains., In: Proc. 33th Internat. Conf. Math. Methods in Economics MME 2015 (D. Martinčík, ed.), University of West Bohemia, Plzeň 2015, pp. 731-736. 
  12. Sobel, M., , J. Appl. Probab. 19 (1982), 794-802. Zbl0503.90091MR0675143DOI
  13. Dijk, N. M. Van, Sladký, K., , J. Appl. Probab. 43 (2006), 1044-1052. MR2274635DOI
  14. Veinott, A. F., Jr, , Ann. Math. Statist. 13 (1969), 1635-1660. MR0256712DOI
  15. White, D. J., , J. Optimizat. Th. Appl. 56 (1988), 1-29. MR0922375DOI
  16. Wu, X., Guo, X., , J. Appl. Probab. 52 (2015), 441-456. Zbl1327.90374MR3372085DOI

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.