Risk-sensitive average optimality in Markov decision processes

Karel Sladký

Risk-sensitive average optimality in Markov decision processes

Karel Sladký

Kybernetika (2018)

Volume: 54, Issue: 6, page 1218-1230
ISSN: 0023-5954

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

In this note attention is focused on finding policies optimizing risk-sensitive optimality criteria in Markov decision chains. To this end we assume that the total reward generated by the Markov process is evaluated by an exponential utility function with a given risk-sensitive coefficient. The ratio of the first two moments depends on the value of the risk-sensitive coefficient; if the risk-sensitive coefficient is equal to zero we speak on risk-neutral models. Observe that the first moment of the generated reward corresponds to the expectation of the total reward and the second central moment of the reward variance. For communicating Markov processes and for some specific classes of unichain processes long run risk-sensitive average reward is independent of the starting state. In this note we present necessary and sufficient condition for existence of optimal policies independent of the starting state in unichain models and characterize the class of average risk-sensitive optimal policies.

How to cite

top

MLA
BibTeX
RIS

Sladký, Karel. "Risk-sensitive average optimality in Markov decision processes." Kybernetika 54.6 (2018): 1218-1230. <http://eudml.org/doc/294578>.

@article{Sladký2018,
abstract = {In this note attention is focused on finding policies optimizing risk-sensitive optimality criteria in Markov decision chains. To this end we assume that the total reward generated by the Markov process is evaluated by an exponential utility function with a given risk-sensitive coefficient. The ratio of the first two moments depends on the value of the risk-sensitive coefficient; if the risk-sensitive coefficient is equal to zero we speak on risk-neutral models. Observe that the first moment of the generated reward corresponds to the expectation of the total reward and the second central moment of the reward variance. For communicating Markov processes and for some specific classes of unichain processes long run risk-sensitive average reward is independent of the starting state. In this note we present necessary and sufficient condition for existence of optimal policies independent of the starting state in unichain models and characterize the class of average risk-sensitive optimal policies.},
author = {Sladký, Karel},
journal = {Kybernetika},
keywords = {controlled Markov processes; finite state space; asymptotic behavior; risk-sensitive average optimality},
language = {eng},
number = {6},
pages = {1218-1230},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Risk-sensitive average optimality in Markov decision processes},
url = {http://eudml.org/doc/294578},
volume = {54},
year = {2018},
}

TY - JOUR
AU - Sladký, Karel
TI - Risk-sensitive average optimality in Markov decision processes
JO - Kybernetika
PY - 2018
PB - Institute of Information Theory and Automation AS CR
VL - 54
IS - 6
SP - 1218
EP - 1230
AB - In this note attention is focused on finding policies optimizing risk-sensitive optimality criteria in Markov decision chains. To this end we assume that the total reward generated by the Markov process is evaluated by an exponential utility function with a given risk-sensitive coefficient. The ratio of the first two moments depends on the value of the risk-sensitive coefficient; if the risk-sensitive coefficient is equal to zero we speak on risk-neutral models. Observe that the first moment of the generated reward corresponds to the expectation of the total reward and the second central moment of the reward variance. For communicating Markov processes and for some specific classes of unichain processes long run risk-sensitive average reward is independent of the starting state. In this note we present necessary and sufficient condition for existence of optimal policies independent of the starting state in unichain models and characterize the class of average risk-sensitive optimal policies.
LA - eng
KW - controlled Markov processes; finite state space; asymptotic behavior; risk-sensitive average optimality
UR - http://eudml.org/doc/294578
ER -

References

top

Arapostathis, A., Borkar, V. S., Fernandez-Gaucherand, F., Ghosh, M. K., Marcus, S. I., 10.1137/0331018, SIAM J. Control Optim. 31 (1993), 282-344. MR1205981 DOI10.1137/0331018
Bather, J., 10.2307/1426039, Adv. Appl. Probab. 5 (1973), 328-339. MR0368790 DOI10.2307/1426039
Bielecki, T. D., Hernández-Hernández, D., Pliska, S. R., 10.1007/s001860050094, Math. Methods Oper. Res. 50 (1999), 167-188. MR1732397 DOI10.1007/s001860050094
Cavazos-Cadena, R., 10.1007/s001860200205, Math. Methods Oper. Res. 56 (2002), 181-196. MR1938210 DOI10.1007/s001860200205
Cavazos-Cadena, R., 10.1007/s001860200256, Math. Methods Oper. Res. 57 (2003), 2, 263-285. MR1973378 DOI10.1007/s001860200256
Cavazos-Cadena, R., 10.1007/s00186-008-0277-y, Math. Methods Oper. Res. 70 (2009), 541-566. MR2558431 DOI10.1007/s00186-008-0277-y
Cavazos-Cadena, R., Fernandez-Gaucherand, F., Controlled Markov chains with risk-sensitive criteria: average cost, optimality equations and optimal solutions., Math. Methods Oper. Res. 43 (1999), 121-139. MR1687362
Cavazos-Cadena, R., Hernández-Hernández, D., 10.1007/s001860400373, Math. Methods Oper. Res. 60 (2004), 399-414. MR2106091 DOI10.1007/s001860400373
Cavazos-Cadena, R., Hernández-Hernández, D., 10.1214/105051604000000585, Ann. Appl. Probab. 15 (2005), 175-212. MR2115041 DOI10.1214/105051604000000585
Cavazos-Cadena, R., Hernández-Hernández, D., 10.1016/j.sysconle.2008.11.001, System Control Lett. 58 (2009), 254-258. MR2510639 DOI10.1016/j.sysconle.2008.11.001
Cavazos-Cadena, R., Montes-de-Oca, R., 10.1287/moor.28.4.752.20515, Math. Oper. Res. 28 (2003), 752-756. MR2015911 DOI10.1287/moor.28.4.752.20515
Cavazos-Cadena, R., Montes-de-Oca, R., 10.1017/s0021900200000991, J. Appl. Probab. 42 (2005), 905-918. MR2203811 DOI10.1017/s0021900200000991
Cavazos-Cadena, R., Feinberg, A., Montes-de-Oca, R., 10.1287/moor.25.4.657.12112, Math. Oper. Res. 25 (2000), 657-666. MR1855371 DOI10.1287/moor.25.4.657.12112
Gantmakher, F. R., The Theory of Matrices., Chelsea, London 1959. MR0107649
Howard, R. A., Dynamic Programming and Markov Processes., MIT Press, Cambridge, Mass. 1960. MR0118514
Howard, R. A., Matheson, J., 10.1287/mnsc.18.7.356, Manag. Sci. 23 (1972), 356-369. MR0292497 DOI10.1287/mnsc.18.7.356
Mandl, P., On the variance in controlled Markov chains., Kybernetika 7 (1971), 1-12. Zbl0215.25902 MR0286178
Mandl, P., 10.2307/1426206, Adv. Appl. Probab. 6 (1974), 40-60. MR0339876 DOI10.2307/1426206
Markowitz, H., 10.1111/j.1540-6261.1952.tb01525.x, J. Finance 7 (1952), 77-92. MR0103768 DOI10.1111/j.1540-6261.1952.tb01525.x
Markowitz, H., Portfolio Selection - Efficient Diversification of Investments., Wiley, New York 1959. MR0103768
Puterman, M. L., 10.1002/9780470316887, Wiley, New York 1994. MR1270015 DOI10.1002/9780470316887
Ross, S. M., Introduction to Stochastic Dynamic Programming., Academic Press, New York 1983. MR0749232
Sladký, K., Necessary and sufficient optimality conditions for average reward of controlled Markov chains., Kybernetika 9 (1973), 124-137. MR0363495
Sladký, K., On the set of optimal controls for Markov chains with rewards., Kybernetika 10 (1974), 526-547. MR0378842
Sladký, K., Growth rates and average optimality in risk-sensitive Markov decision chains., Kybernetika 44 (2008), 205-226. MR2428220
Sladký, K., 10.1007/3-540-32539-5_125, In: Proc. 30th Int. Conf. Math. Meth. Economics 2012, Part II (J.Ramík and D.Stavárek, eds.), Silesian University, School of Business Administration, Karviná 2012, pp. 799-804. DOI10.1007/3-540-32539-5_125
Sladký, K., Risk-sensitive and mean variance optimality in Markov decision processes., Acta Oeconomica Pragensia 7 (2013), 146-161.
Dijk, N. M. van, Sladký, K., 10.1017/s0021900200002412, J. Appl. Probab. 43 (2006), 1044-1052. MR2274635 DOI10.1017/s0021900200002412

Citations in EuDML Documents

top

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.