Growth rates and average optimality in risk-sensitive Markov decision chains

Karel Sladký

Growth rates and average optimality in risk-sensitive Markov decision chains

Karel Sladký

Kybernetika (2008)

Volume: 44, Issue: 2, page 205-226
ISSN: 0023-5954

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent.

How to cite

top

MLA
BibTeX
RIS

Sladký, Karel. "Growth rates and average optimality in risk-sensitive Markov decision chains." Kybernetika 44.2 (2008): 205-226. <http://eudml.org/doc/33922>.

@article{Sladký2008,
abstract = {In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent.},
author = {Sladký, Karel},
journal = {Kybernetika},
keywords = {risk-sensitive Markov decision chains; average optimal policies; optimal growth rates; risk-sensitive Markov decision chains; average optimal policies; optimal growth rates},
language = {eng},
number = {2},
pages = {205-226},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Growth rates and average optimality in risk-sensitive Markov decision chains},
url = {http://eudml.org/doc/33922},
volume = {44},
year = {2008},
}

TY - JOUR
AU - Sladký, Karel
TI - Growth rates and average optimality in risk-sensitive Markov decision chains
JO - Kybernetika
PY - 2008
PB - Institute of Information Theory and Automation AS CR
VL - 44
IS - 2
SP - 205
EP - 226
AB - In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent.
LA - eng
KW - risk-sensitive Markov decision chains; average optimal policies; optimal growth rates; risk-sensitive Markov decision chains; average optimal policies; optimal growth rates
UR - http://eudml.org/doc/33922
ER -

References

top

Bellman R., On a quasi-linear equation, Canadian J. Math. 8 (1956), 198–202 (1956) Zbl0071.34902 MR0077782
Bellman R., Dynamic Programming, Princenton Univ. Press, Princenton, N.J. 1957 Zbl1205.90002 MR0090477
Berman A., Plemmons R. J., Nonnegative Matrices in the Mathematical Sciences, Academic Press, New York 1979 Zbl0815.15016 MR0544666
Bielecki T. D., Hernández-Hernández, D., Pliska S. R., Risk-sensitive control of finite state Markov chains in discrete time, with application to portfolio management, Math. Methods Oper. Res. 50 (1999), 167–188 (1999) MR1732397
Cavazos-Cadena R., Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space, Math. Methods Oper. Res. 57 (2003), 253–285 Zbl1023.90076 MR1973378
Cavazos-Cadena R., Montes-de-Oca R., The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space, Math. Oper. Res. 28 (2003), 752–756 Zbl1082.90125 MR2015911
Cavazos-Cadena R., Montes-de-Oca R., Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion, J. Appl. Probab. 42 (2005), 905–918 Zbl1105.90101 MR2203811
Cavazos-Cadena R., Hernández-Hernández D., Solution fo the risk-sensitive average optimality equation in communicating Markov decision chains with finite state space: An alternative approach, Math. Methods Oper. Res. 56 (2002), 473–479 MR1953028
Cavazos-Cadena R., Hernández-Hernández D., A characterization exponential functionals in finite Markov chains, Math. Methods Oper. Res. 60 (2004), 399–414 MR2106091
Cavazos-Cadena R., Hernández-Hernández D., A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains, Ann. Appl. Probab. 15 (2005), 175–212 Zbl1076.93045 MR2115041
Gantmakher F. R., The Theory of Matrices, Chelsea, London 1959
Howard R. A., Matheson J., Risk-sensitive Markov decision processes, Manag. Sci. 23 (1972), 356–369 (1972) Zbl0238.90007 MR0292497
Jaquette S. A., A utility criterion for Markov decision processes, Manag. Sci. 23 (1976), 43–49 (1976) Zbl0337.90053 MR0439037
Mandl P., Decomposable non-negative matrices in a dynamic programming problem, Czechoslovak Math. J. 20 (1970), 504–510 (1970) Zbl0209.22902 MR0264978
Rothblum U. G., Whittle P., Growth optimality for branching Markov decision chains, Math. Oper. Res. 7 (1982), 582–601 (1982) Zbl0498.90082 MR0686533
Sladký K., On dynamic programming recursions for multiplicative Markov decision chains, Math. Programming Study 6 (1976), 216–226 (1976) Zbl0374.60089 MR0452725
Sladký K., Successive approximation methods for dynamic programming models, In: Proc. of the Third Formator Symposium on the Analysis of Large-Scale Systems (J. Beneš and L. Bakule, eds.). Academia, Prague 1979, pp. 171–189 (1979) Zbl0496.90081
Sladký K., Bounds on discrete dynamic programming recursions I, Kybernetika 16 (1980), 526–547 (1980) Zbl0454.90085 MR0607292
Whittle P., Optimization Over Time – Dynamic Programming and Stochastic Control, Volume II, Chapter 35, Wiley, Chichester 1983 MR0710833
Zijm W. H. M., Nonnegative Matrices in Dynamic Programming, Mathematical Centre Tract, Amsterdam 1983 Zbl0526.90059 MR0723868

Citations in EuDML Documents

top

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.