An optimality system for finite average Markov decision chains under risk-aversion
Alfredo Alanís-Durán; Rolando Cavazos-Cadena
Kybernetika (2012)
- Volume: 48, Issue: 1, page 83-104
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topAlanís-Durán, Alfredo, and Cavazos-Cadena, Rolando. "An optimality system for finite average Markov decision chains under risk-aversion." Kybernetika 48.1 (2012): 83-104. <http://eudml.org/doc/247175>.
@article{Alanís2012,
abstract = {This work concerns controlled Markov chains with finite state space and compact action sets. The decision maker is risk-averse with constant risk-sensitivity, and the performance of a control policy is measured by the long-run average cost criterion. Under standard continuity-compactness conditions, it is shown that the (possibly non-constant) optimal value function is characterized by a system of optimality equations which allows to obtain an optimal stationary policy. Also, it is shown that the optimal superior and inferior limit average cost functions coincide.},
author = {Alanís-Durán, Alfredo, Cavazos-Cadena, Rolando},
journal = {Kybernetika},
keywords = {partition of the state space; nonconstant optimal average cost; discounted approximations to the risk-sensitive average cost criterion; equality of superior and inferior limit risk-averse average criteria; partition of the state space; nonconstant optimal average cost; equality of superior and inferior limit risk-averse average criteria},
language = {eng},
number = {1},
pages = {83-104},
publisher = {Institute of Information Theory and Automation AS CR},
title = {An optimality system for finite average Markov decision chains under risk-aversion},
url = {http://eudml.org/doc/247175},
volume = {48},
year = {2012},
}
TY - JOUR
AU - Alanís-Durán, Alfredo
AU - Cavazos-Cadena, Rolando
TI - An optimality system for finite average Markov decision chains under risk-aversion
JO - Kybernetika
PY - 2012
PB - Institute of Information Theory and Automation AS CR
VL - 48
IS - 1
SP - 83
EP - 104
AB - This work concerns controlled Markov chains with finite state space and compact action sets. The decision maker is risk-averse with constant risk-sensitivity, and the performance of a control policy is measured by the long-run average cost criterion. Under standard continuity-compactness conditions, it is shown that the (possibly non-constant) optimal value function is characterized by a system of optimality equations which allows to obtain an optimal stationary policy. Also, it is shown that the optimal superior and inferior limit average cost functions coincide.
LA - eng
KW - partition of the state space; nonconstant optimal average cost; discounted approximations to the risk-sensitive average cost criterion; equality of superior and inferior limit risk-averse average criteria; partition of the state space; nonconstant optimal average cost; equality of superior and inferior limit risk-averse average criteria
UR - http://eudml.org/doc/247175
ER -
References
top- A. Arapstathis, V. K. Borkar, E. Fernández-Gaucherand, M. K. Gosh, S. I. Marcus, 10.1137/0331018, SIAM J. Control Optim. 31 (1993), 282-334. (1993) MR1205981DOI10.1137/0331018
- P. Billingsley, Probability and Measure., Third edition. Wiley, New York 1995. (1995) Zbl0822.60002MR1324786
- R. Cavazos-Cadena, E. Fernández-Gaucherand, Controlled Markov chains with risk-sensitive criteria: average cost, optimality equations and optimal solutions., {Math. Method Optim. Res.} 43 (1999), 121-139. (1999) Zbl0953.93077MR1687362
- R. Cavazos-Cadena, E. Fernández-Gaucherand, Risk-sensitive control in communicating average Markov decision chains., In: { Modelling Uncertainty: An examination of Stochastic Theory, Methods and Applications} (M. Dror, P. L'Ecuyer and F. Szidarovsky, eds.), Kluwer, Boston 2002, pp. 525-544. (2002)
- R. Cavazos-Cadena, 10.1007/s001860200256, {Math. Method Optim. Res.} 57 (2003), 263-285. (2003) Zbl1023.90076MR1973378DOI10.1007/s001860200256
- R. Cavazos-Cadena, D. Hernández-Hernández, 10.1214/105051604000000585, {Ann. App. Probab.}, 15 (2005), 175-212. (2005) Zbl1076.93045MR2115041DOI10.1214/105051604000000585
- R. Cavazos-Cadena, D. Hernández-Hernández, 10.1007/s00245-005-0840-3, {Appl. Math. Optim.} 53 (2006), 101-119. (2006) MR2190228DOI10.1007/s00245-005-0840-3
- R. Cavazos-Cadena, F. Salem-Silva, 10.1007/s00245-009-9080-2, { Appl. Math. Optim.} 61 (2009), 167-190. (2009) MR2585141DOI10.1007/s00245-009-9080-2
- G. B. Di Masi, L. Stettner, 10.1137/S0363012997320614, {SIAM J. Control Optim.} 38 1999, 61-78. (1999) Zbl0946.93043MR1740607DOI10.1137/S0363012997320614
- G. B. Di Masi, L. Stettner, 10.1016/S0167-6911(99)00118-8, {Syst. Control Lett.} 40 (2000), 15-20. (2000) Zbl0977.93083MR1829070DOI10.1016/S0167-6911(99)00118-8
- G. B. Di Masi, L. Stettner, 10.1137/040618631, {SIAM J. Control Optim.} 46 (2007), 231-252. (2007) Zbl1141.93067MR2299627DOI10.1137/040618631
- W. H. Fleming, W. M. McEneany, 10.1137/S0363012993258720, {SIAM J. Control Optim.} 33 (1995), 1881-1915. (1995) MR1358100DOI10.1137/S0363012993258720
- F. R. Gantmakher, The Theory of Matrices., {Chelsea}, London 1959. (1959)
- D. Hernández-Hernández, S. I. Marcus, 10.1016/S0167-6911(96)00051-5, {Syst. Control Lett.} 29 (1996), 147-155. (1996) Zbl0866.93101MR1422212DOI10.1016/S0167-6911(96)00051-5
- D. Hernández-Hernández, S. I. Marcus, 10.1007/s002459900126, {Appl. Math. Optim.} 40 (1999), 273-285. (1999) Zbl0937.90115MR1709324DOI10.1007/s002459900126
- A. R. Howard, J. E. Matheson, 10.1287/mnsc.18.7.356, {Management Sci.} 18 (1972), 356-369. (1972) Zbl0238.90007MR0292497DOI10.1287/mnsc.18.7.356
- D. H. Jacobson, 10.1109/TAC.1973.1100265, {IEEE Trans. Automat. Control} 18 (1973), 124-131. (1973) MR0441523DOI10.1109/TAC.1973.1100265
- S. C. Jaquette, 10.1214/aos/1176342415, {Ann. Statist.} 1 (1973), 496-505. (1973) MR0378839DOI10.1214/aos/1176342415
- S. C. Jaquette, 10.1287/mnsc.23.1.43, {Management Sci.} 23 (1976), 43-49. (1976) Zbl0337.90053MR0439037DOI10.1287/mnsc.23.1.43
- A. Jaśkiewicz, 10.1214/105051606000000790, {Ann. App. Probab.} 17 (2007), 654-675. (2007) Zbl1128.93056MR2308338DOI10.1214/105051606000000790
- U. G. Rothblum, P. Whittle, 10.1287/moor.7.4.582, {Math. Oper. Res.} 7 (1982), 582-601. (1982) Zbl0498.90082MR0686533DOI10.1287/moor.7.4.582
- K. Sladký, Successive approximation methods for dynamic programming models., In: Proc. Third Formator Symposium on the Analysis of Large-Scale Systems (J. Beneš and L. Bakule, eds.), Academia, Prague 1979, pp. 171-189. (1979) Zbl0496.90081
- K. Sladký, Bounds on discrete dynamic programming recursions I., {Kybernetika} 16 (1980), 526-547. (1980) Zbl0454.90085MR0607292
- K. Sladký, Growth rates and average optimality in risk-sensitive Markov decision chains., {Kybernetika} 44 (2008), 205-226. (2008) Zbl1154.90612MR2428220
- K. Sladký, R. Montes-de-Oca, 10.1007/978-3-540-77903-2_11, In: Operations Research Proceedings, Vol. 2007, Part III (2008), pp. 69-74. (2008) Zbl1209.90348DOI10.1007/978-3-540-77903-2_11
- P. Whittle, Optimization Over Time-Dynamic Programming and Stochastic Control., Wiley, Chichester 1983. (1983) MR0710833
- W. H. M. Zijm, Nonnegative Matrices in Dynamic Programming., Mathematical Centre Tract, Amsterdam 1983. (1983) Zbl0526.90059MR0723868
Citations in EuDML Documents
topNotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.