Risk-sensitive average optimality in Markov decision processes
Kybernetika (2018)
- Volume: 54, Issue: 6, page 1218-1230
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topSladký, Karel. "Risk-sensitive average optimality in Markov decision processes." Kybernetika 54.6 (2018): 1218-1230. <http://eudml.org/doc/294578>.
@article{Sladký2018,
abstract = {In this note attention is focused on finding policies optimizing risk-sensitive optimality criteria in Markov decision chains. To this end we assume that the total reward generated by the Markov process is evaluated by an exponential utility function with a given risk-sensitive coefficient. The ratio of the first two moments depends on the value of the risk-sensitive coefficient; if the risk-sensitive coefficient is equal to zero we speak on risk-neutral models. Observe that the first moment of the generated reward corresponds to the expectation of the total reward and the second central moment of the reward variance. For communicating Markov processes and for some specific classes of unichain processes long run risk-sensitive average reward is independent of the starting state. In this note we present necessary and sufficient condition for existence of optimal policies independent of the starting state in unichain models and characterize the class of average risk-sensitive optimal policies.},
author = {Sladký, Karel},
journal = {Kybernetika},
keywords = {controlled Markov processes; finite state space; asymptotic behavior; risk-sensitive average optimality},
language = {eng},
number = {6},
pages = {1218-1230},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Risk-sensitive average optimality in Markov decision processes},
url = {http://eudml.org/doc/294578},
volume = {54},
year = {2018},
}
TY - JOUR
AU - Sladký, Karel
TI - Risk-sensitive average optimality in Markov decision processes
JO - Kybernetika
PY - 2018
PB - Institute of Information Theory and Automation AS CR
VL - 54
IS - 6
SP - 1218
EP - 1230
AB - In this note attention is focused on finding policies optimizing risk-sensitive optimality criteria in Markov decision chains. To this end we assume that the total reward generated by the Markov process is evaluated by an exponential utility function with a given risk-sensitive coefficient. The ratio of the first two moments depends on the value of the risk-sensitive coefficient; if the risk-sensitive coefficient is equal to zero we speak on risk-neutral models. Observe that the first moment of the generated reward corresponds to the expectation of the total reward and the second central moment of the reward variance. For communicating Markov processes and for some specific classes of unichain processes long run risk-sensitive average reward is independent of the starting state. In this note we present necessary and sufficient condition for existence of optimal policies independent of the starting state in unichain models and characterize the class of average risk-sensitive optimal policies.
LA - eng
KW - controlled Markov processes; finite state space; asymptotic behavior; risk-sensitive average optimality
UR - http://eudml.org/doc/294578
ER -
References
top- Arapostathis, A., Borkar, V. S., Fernandez-Gaucherand, F., Ghosh, M. K., Marcus, S. I., 10.1137/0331018, SIAM J. Control Optim. 31 (1993), 282-344. MR1205981DOI10.1137/0331018
- Bather, J., 10.2307/1426039, Adv. Appl. Probab. 5 (1973), 328-339. MR0368790DOI10.2307/1426039
- Bielecki, T. D., Hernández-Hernández, D., Pliska, S. R., 10.1007/s001860050094, Math. Methods Oper. Res. 50 (1999), 167-188. MR1732397DOI10.1007/s001860050094
- Cavazos-Cadena, R., 10.1007/s001860200205, Math. Methods Oper. Res. 56 (2002), 181-196. MR1938210DOI10.1007/s001860200205
- Cavazos-Cadena, R., 10.1007/s001860200256, Math. Methods Oper. Res. 57 (2003), 2, 263-285. MR1973378DOI10.1007/s001860200256
- Cavazos-Cadena, R., 10.1007/s00186-008-0277-y, Math. Methods Oper. Res. 70 (2009), 541-566. MR2558431DOI10.1007/s00186-008-0277-y
- Cavazos-Cadena, R., Fernandez-Gaucherand, F., Controlled Markov chains with risk-sensitive criteria: average cost, optimality equations and optimal solutions., Math. Methods Oper. Res. 43 (1999), 121-139. MR1687362
- Cavazos-Cadena, R., Hernández-Hernández, D., 10.1007/s001860400373, Math. Methods Oper. Res. 60 (2004), 399-414. MR2106091DOI10.1007/s001860400373
- Cavazos-Cadena, R., Hernández-Hernández, D., 10.1214/105051604000000585, Ann. Appl. Probab. 15 (2005), 175-212. MR2115041DOI10.1214/105051604000000585
- Cavazos-Cadena, R., Hernández-Hernández, D., 10.1016/j.sysconle.2008.11.001, System Control Lett. 58 (2009), 254-258. MR2510639DOI10.1016/j.sysconle.2008.11.001
- Cavazos-Cadena, R., Montes-de-Oca, R., 10.1287/moor.28.4.752.20515, Math. Oper. Res. 28 (2003), 752-756. MR2015911DOI10.1287/moor.28.4.752.20515
- Cavazos-Cadena, R., Montes-de-Oca, R., 10.1017/s0021900200000991, J. Appl. Probab. 42 (2005), 905-918. MR2203811DOI10.1017/s0021900200000991
- Cavazos-Cadena, R., Feinberg, A., Montes-de-Oca, R., 10.1287/moor.25.4.657.12112, Math. Oper. Res. 25 (2000), 657-666. MR1855371DOI10.1287/moor.25.4.657.12112
- Gantmakher, F. R., The Theory of Matrices., Chelsea, London 1959. MR0107649
- Howard, R. A., Dynamic Programming and Markov Processes., MIT Press, Cambridge, Mass. 1960. MR0118514
- Howard, R. A., Matheson, J., 10.1287/mnsc.18.7.356, Manag. Sci. 23 (1972), 356-369. MR0292497DOI10.1287/mnsc.18.7.356
- Mandl, P., On the variance in controlled Markov chains., Kybernetika 7 (1971), 1-12. Zbl0215.25902MR0286178
- Mandl, P., 10.2307/1426206, Adv. Appl. Probab. 6 (1974), 40-60. MR0339876DOI10.2307/1426206
- Markowitz, H., 10.1111/j.1540-6261.1952.tb01525.x, J. Finance 7 (1952), 77-92. MR0103768DOI10.1111/j.1540-6261.1952.tb01525.x
- Markowitz, H., Portfolio Selection - Efficient Diversification of Investments., Wiley, New York 1959. MR0103768
- Puterman, M. L., 10.1002/9780470316887, Wiley, New York 1994. MR1270015DOI10.1002/9780470316887
- Ross, S. M., Introduction to Stochastic Dynamic Programming., Academic Press, New York 1983. MR0749232
- Sladký, K., Necessary and sufficient optimality conditions for average reward of controlled Markov chains., Kybernetika 9 (1973), 124-137. MR0363495
- Sladký, K., On the set of optimal controls for Markov chains with rewards., Kybernetika 10 (1974), 526-547. MR0378842
- Sladký, K., Growth rates and average optimality in risk-sensitive Markov decision chains., Kybernetika 44 (2008), 205-226. MR2428220
- Sladký, K., 10.1007/3-540-32539-5_125, In: Proc. 30th Int. Conf. Math. Meth. Economics 2012, Part II (J.Ramík and D.Stavárek, eds.), Silesian University, School of Business Administration, Karviná 2012, pp. 799-804. DOI10.1007/3-540-32539-5_125
- Sladký, K., Risk-sensitive and mean variance optimality in Markov decision processes., Acta Oeconomica Pragensia 7 (2013), 146-161.
- Dijk, N. M. van, Sladký, K., 10.1017/s0021900200002412, J. Appl. Probab. 43 (2006), 1044-1052. MR2274635DOI10.1017/s0021900200002412
Citations in EuDML Documents
top- Rubén Becerril-Borja, Raúl Montes-de-Oca, Incomplete information and risk sensitive analysis of sequential games without a predetermined order of turns
- Rolando Cavazos-Cadena, Luis Rodríguez-Gutiérrez, Dulce María Sánchez-Guillermo, Markov stopping games with an absorbing state and total reward criterion
- Jaicer López-Rivero, Rolando Cavazos-Cadena, Hugo Cruz-Suárez, Risk-sensitive Markov stopping games with an absorbing state
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.