Growth rates and average optimality in risk-sensitive Markov decision chains

Karel Sladký

Kybernetika (2008)

  • Volume: 44, Issue: 2, page 205-226
  • ISSN: 0023-5954

Abstract

top
In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent.

How to cite

top

Sladký, Karel. "Growth rates and average optimality in risk-sensitive Markov decision chains." Kybernetika 44.2 (2008): 205-226. <http://eudml.org/doc/33922>.

@article{Sladký2008,
abstract = {In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent.},
author = {Sladký, Karel},
journal = {Kybernetika},
keywords = {risk-sensitive Markov decision chains; average optimal policies; optimal growth rates; risk-sensitive Markov decision chains; average optimal policies; optimal growth rates},
language = {eng},
number = {2},
pages = {205-226},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Growth rates and average optimality in risk-sensitive Markov decision chains},
url = {http://eudml.org/doc/33922},
volume = {44},
year = {2008},
}

TY - JOUR
AU - Sladký, Karel
TI - Growth rates and average optimality in risk-sensitive Markov decision chains
JO - Kybernetika
PY - 2008
PB - Institute of Information Theory and Automation AS CR
VL - 44
IS - 2
SP - 205
EP - 226
AB - In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent.
LA - eng
KW - risk-sensitive Markov decision chains; average optimal policies; optimal growth rates; risk-sensitive Markov decision chains; average optimal policies; optimal growth rates
UR - http://eudml.org/doc/33922
ER -

References

top
  1. Bellman R., On a quasi-linear equation, Canadian J. Math. 8 (1956), 198–202 (1956) Zbl0071.34902MR0077782
  2. Bellman R., Dynamic Programming, Princenton Univ. Press, Princenton, N.J. 1957 Zbl1205.90002MR0090477
  3. Berman A., Plemmons R. J., Nonnegative Matrices in the Mathematical Sciences, Academic Press, New York 1979 Zbl0815.15016MR0544666
  4. Bielecki T. D., Hernández-Hernández, D., Pliska S. R., Risk-sensitive control of finite state Markov chains in discrete time, with application to portfolio management, Math. Methods Oper. Res. 50 (1999), 167–188 (1999) MR1732397
  5. Cavazos-Cadena R., Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space, Math. Methods Oper. Res. 57 (2003), 253–285 Zbl1023.90076MR1973378
  6. Cavazos-Cadena R., Montes-de-Oca R., The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space, Math. Oper. Res. 28 (2003), 752–756 Zbl1082.90125MR2015911
  7. Cavazos-Cadena R., Montes-de-Oca R., Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion, J. Appl. Probab. 42 (2005), 905–918 Zbl1105.90101MR2203811
  8. Cavazos-Cadena R., Hernández-Hernández D., Solution fo the risk-sensitive average optimality equation in communicating Markov decision chains with finite state space: An alternative approach, Math. Methods Oper. Res. 56 (2002), 473–479 MR1953028
  9. Cavazos-Cadena R., Hernández-Hernández D., A characterization exponential functionals in finite Markov chains, Math. Methods Oper. Res. 60 (2004), 399–414 MR2106091
  10. Cavazos-Cadena R., Hernández-Hernández D., A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains, Ann. Appl. Probab. 15 (2005), 175–212 Zbl1076.93045MR2115041
  11. Gantmakher F. R., The Theory of Matrices, Chelsea, London 1959 
  12. Howard R. A., Matheson J., Risk-sensitive Markov decision processes, Manag. Sci. 23 (1972), 356–369 (1972) Zbl0238.90007MR0292497
  13. Jaquette S. A., A utility criterion for Markov decision processes, Manag. Sci. 23 (1976), 43–49 (1976) Zbl0337.90053MR0439037
  14. Mandl P., Decomposable non-negative matrices in a dynamic programming problem, Czechoslovak Math. J. 20 (1970), 504–510 (1970) Zbl0209.22902MR0264978
  15. Rothblum U. G., Whittle P., Growth optimality for branching Markov decision chains, Math. Oper. Res. 7 (1982), 582–601 (1982) Zbl0498.90082MR0686533
  16. Sladký K., On dynamic programming recursions for multiplicative Markov decision chains, Math. Programming Study 6 (1976), 216–226 (1976) Zbl0374.60089MR0452725
  17. Sladký K., Successive approximation methods for dynamic programming models, In: Proc. of the Third Formator Symposium on the Analysis of Large-Scale Systems (J. Beneš and L. Bakule, eds.). Academia, Prague 1979, pp. 171–189 (1979) Zbl0496.90081
  18. Sladký K., Bounds on discrete dynamic programming recursions I, Kybernetika 16 (1980), 526–547 (1980) Zbl0454.90085MR0607292
  19. Whittle P., Optimization Over Time – Dynamic Programming and Stochastic Control, Volume II, Chapter 35, Wiley, Chichester 1983 MR0710833
  20. Zijm W. H. M., Nonnegative Matrices in Dynamic Programming, Mathematical Centre Tract, Amsterdam 1983 Zbl0526.90059MR0723868

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.