Constrained optimality problem of Markov decision processes with Borel spaces and varying discount factors

Xiao Wu; Yanqiu Tang

Kybernetika (2021)

  • Issue: 2, page 295-311
  • ISSN: 0023-5954

Abstract

top
This paper focuses on the constrained optimality of discrete-time Markov decision processes (DTMDPs) with state-dependent discount factors, Borel state and compact Borel action spaces, and possibly unbounded costs. By means of the properties of so-called occupation measures of policies and the technique of transforming the original constrained optimality problem of DTMDPs into a convex program one, we prove the existence of an optimal randomized stationary policies under reasonable conditions.

How to cite

top

Wu, Xiao, and Tang, Yanqiu. "Constrained optimality problem of Markov decision processes with Borel spaces and varying discount factors." Kybernetika (2021): 295-311. <http://eudml.org/doc/297751>.

@article{Wu2021,
abstract = {This paper focuses on the constrained optimality of discrete-time Markov decision processes (DTMDPs) with state-dependent discount factors, Borel state and compact Borel action spaces, and possibly unbounded costs. By means of the properties of so-called occupation measures of policies and the technique of transforming the original constrained optimality problem of DTMDPs into a convex program one, we prove the existence of an optimal randomized stationary policies under reasonable conditions.},
author = {Wu, Xiao, Tang, Yanqiu},
journal = {Kybernetika},
keywords = {constrained optimality problem; discrete-time Markov decision processes; Borel state and action spaces; varying discount factors; unbounded costs},
language = {eng},
number = {2},
pages = {295-311},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Constrained optimality problem of Markov decision processes with Borel spaces and varying discount factors},
url = {http://eudml.org/doc/297751},
year = {2021},
}

TY - JOUR
AU - Wu, Xiao
AU - Tang, Yanqiu
TI - Constrained optimality problem of Markov decision processes with Borel spaces and varying discount factors
JO - Kybernetika
PY - 2021
PB - Institute of Information Theory and Automation AS CR
IS - 2
SP - 295
EP - 311
AB - This paper focuses on the constrained optimality of discrete-time Markov decision processes (DTMDPs) with state-dependent discount factors, Borel state and compact Borel action spaces, and possibly unbounded costs. By means of the properties of so-called occupation measures of policies and the technique of transforming the original constrained optimality problem of DTMDPs into a convex program one, we prove the existence of an optimal randomized stationary policies under reasonable conditions.
LA - eng
KW - constrained optimality problem; discrete-time Markov decision processes; Borel state and action spaces; varying discount factors; unbounded costs
UR - http://eudml.org/doc/297751
ER -

References

top
  1. Altman, E., , Math. Meth. Operat. Res. 19 (1994), 169-191. MR1290018DOI
  2. Altman, E., Constrained Markov decision processes., Chapman and Hall/CRC, Boca Raton 1999. MR1703380
  3. Alvarez-Mena, J., Hernández-Lerma, O., 10.1007/s001860200209, Math. Meth. Oper. Res. 55 (2002), 461-484. MR1913577DOI10.1007/s001860200209
  4. Borkar, V., 10.1007/BF00353877, Probab. Theory Relat. Fields 78 (1988), 583-602. MR0950347DOI10.1007/BF00353877
  5. González-Hernández, J., Hernández-Lerma, O., , SIAM. J. Optim. 15 (2005), 1085-1104. MR2178489DOI
  6. Guo, X. P., Hernández-del-Valle, A., Hernández-Lerma, O., , Europ. J. Control 18 (2012), 528-538. Zbl1291.93328MR3086896DOI
  7. Guo, X. P., Zhang, W. Z., , Europ. J, Oper. Res. 238 (2014), 486-496. MR3210941DOI
  8. Guo, X. P., Song, X. Y., Zhang, Y., , IEEE Trans. Automat. Control 59 (2014), 163-174. MR3163332DOI
  9. Hernández-Lerma, O., González-Hernández, J., , Math. Meth. Operat. Res. 52 (2000), 271-285. MR1797253DOI
  10. Hernández-Lerma, O., Lasserre, J. B., Discrete-Time Markov Control Processes., Springer-Verlag, New York 1996. Zbl0928.93002MR1363487
  11. Hernández-Lerma, O., Lasserre, J. B., Discrete-Time Markov Control Processes., Springer-Verlag, New York 1999. Zbl0928.93002MR1363487
  12. Hernández-Lerma, O., Lasserre, J. B., , J. Appl. Math. Stoch. Anal. 13(2) (2000), 137-146. MR1768500DOI
  13. Huang, Y. H., Guo, X. P., , Acta. Math. Appl. Sin-E. 27(2) (2011), 177-190. Zbl1235.90177MR2784052DOI
  14. Huang, Y. H., Wei, Q. D., Guo, X. P., , Ann. Oper. Res. 206 (2013), 197-219. MR3073845DOI
  15. Mao, X., Piunovskiy, A., , Stoch. Anal. Appl. 18 (2000), 755-776. MR1780169DOI
  16. Piunovskiy, A., Optimal Control of Random Sequences in Problems with Constraints., Kluwer Academic, Dordrecht 1997. MR1472738
  17. Piunovskiy, A., , Russ. Math. Surv., 53 (2000), 1233-1293. MR1702690DOI
  18. Prokhorov, Y., , Theory Probab Appl. 1 (1956), 157-214. MR0084896DOI
  19. Wei, Q. D., Guo, X. P., , Oper. Res. Lett. 39 (2011), 369-374. MR2835530DOI
  20. Wu, X., Guo, X. P., , J. Appl. Probab. 52(2) (2015), 441-456. MR3372085DOI
  21. Zhang, Y., , TOP 21 (2013), 378-408. Zbl1273.90235MR3068494DOI

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.