Growth rates and average optimality in risk-sensitive Markov decision chains
Kybernetika (2008)
- Volume: 44, Issue: 2, page 205-226
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topSladký, Karel. "Growth rates and average optimality in risk-sensitive Markov decision chains." Kybernetika 44.2 (2008): 205-226. <http://eudml.org/doc/33922>.
@article{Sladký2008,
abstract = {In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent.},
author = {Sladký, Karel},
journal = {Kybernetika},
keywords = {risk-sensitive Markov decision chains; average optimal policies; optimal growth rates; risk-sensitive Markov decision chains; average optimal policies; optimal growth rates},
language = {eng},
number = {2},
pages = {205-226},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Growth rates and average optimality in risk-sensitive Markov decision chains},
url = {http://eudml.org/doc/33922},
volume = {44},
year = {2008},
}
TY - JOUR
AU - Sladký, Karel
TI - Growth rates and average optimality in risk-sensitive Markov decision chains
JO - Kybernetika
PY - 2008
PB - Institute of Information Theory and Automation AS CR
VL - 44
IS - 2
SP - 205
EP - 226
AB - In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection of nonnegative matrices we establish necessary and sufficient conditions guaranteeing independence of optimal values on starting state along with partition of the state space into subsets with constant optimal values. Finally, for models with growth rate independent of the starting state we show how the methods work if we minimize growth rate or average of the certainty equivalent.
LA - eng
KW - risk-sensitive Markov decision chains; average optimal policies; optimal growth rates; risk-sensitive Markov decision chains; average optimal policies; optimal growth rates
UR - http://eudml.org/doc/33922
ER -
References
top- Bellman R., On a quasi-linear equation, Canadian J. Math. 8 (1956), 198–202 (1956) Zbl0071.34902MR0077782
- Bellman R., Dynamic Programming, Princenton Univ. Press, Princenton, N.J. 1957 Zbl1205.90002MR0090477
- Berman A., Plemmons R. J., Nonnegative Matrices in the Mathematical Sciences, Academic Press, New York 1979 Zbl0815.15016MR0544666
- Bielecki T. D., Hernández-Hernández, D., Pliska S. R., Risk-sensitive control of finite state Markov chains in discrete time, with application to portfolio management, Math. Methods Oper. Res. 50 (1999), 167–188 (1999) MR1732397
- Cavazos-Cadena R., Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space, Math. Methods Oper. Res. 57 (2003), 253–285 Zbl1023.90076MR1973378
- Cavazos-Cadena R., Montes-de-Oca R., The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space, Math. Oper. Res. 28 (2003), 752–756 Zbl1082.90125MR2015911
- Cavazos-Cadena R., Montes-de-Oca R., Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion, J. Appl. Probab. 42 (2005), 905–918 Zbl1105.90101MR2203811
- Cavazos-Cadena R., Hernández-Hernández D., Solution fo the risk-sensitive average optimality equation in communicating Markov decision chains with finite state space: An alternative approach, Math. Methods Oper. Res. 56 (2002), 473–479 MR1953028
- Cavazos-Cadena R., Hernández-Hernández D., A characterization exponential functionals in finite Markov chains, Math. Methods Oper. Res. 60 (2004), 399–414 MR2106091
- Cavazos-Cadena R., Hernández-Hernández D., A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains, Ann. Appl. Probab. 15 (2005), 175–212 Zbl1076.93045MR2115041
- Gantmakher F. R., The Theory of Matrices, Chelsea, London 1959
- Howard R. A., Matheson J., Risk-sensitive Markov decision processes, Manag. Sci. 23 (1972), 356–369 (1972) Zbl0238.90007MR0292497
- Jaquette S. A., A utility criterion for Markov decision processes, Manag. Sci. 23 (1976), 43–49 (1976) Zbl0337.90053MR0439037
- Mandl P., Decomposable non-negative matrices in a dynamic programming problem, Czechoslovak Math. J. 20 (1970), 504–510 (1970) Zbl0209.22902MR0264978
- Rothblum U. G., Whittle P., Growth optimality for branching Markov decision chains, Math. Oper. Res. 7 (1982), 582–601 (1982) Zbl0498.90082MR0686533
- Sladký K., On dynamic programming recursions for multiplicative Markov decision chains, Math. Programming Study 6 (1976), 216–226 (1976) Zbl0374.60089MR0452725
- Sladký K., Successive approximation methods for dynamic programming models, In: Proc. of the Third Formator Symposium on the Analysis of Large-Scale Systems (J. Beneš and L. Bakule, eds.). Academia, Prague 1979, pp. 171–189 (1979) Zbl0496.90081
- Sladký K., Bounds on discrete dynamic programming recursions I, Kybernetika 16 (1980), 526–547 (1980) Zbl0454.90085MR0607292
- Whittle P., Optimization Over Time – Dynamic Programming and Stochastic Control, Volume II, Chapter 35, Wiley, Chichester 1983 MR0710833
- Zijm W. H. M., Nonnegative Matrices in Dynamic Programming, Mathematical Centre Tract, Amsterdam 1983 Zbl0526.90059MR0723868
Citations in EuDML Documents
top- Rolando Cavazos-Cadena, The risk-sensitive Poisson equation for a communicating Markov chain on a denumerable state space
- Alfredo Alanís-Durán, Rolando Cavazos-Cadena, An optimality system for finite average Markov decision chains under risk-aversion
- Karel Sladký, Risk-sensitive average optimality in Markov decision processes
- Jaicer López-Rivero, Rolando Cavazos-Cadena, Hugo Cruz-Suárez, Risk-sensitive Markov stopping games with an absorbing state
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.