First passage risk probability optimality for continuous time Markov decision processes

Haifeng Huo; Xian Wen

Kybernetika (2019)

  • Volume: 55, Issue: 1, page 114-133
  • ISSN: 0023-5954

Abstract

top
In this paper, we study continuous time Markov decision processes (CTMDPs) with a denumerable state space, a Borel action space, unbounded transition rates and nonnegative reward function. The optimality criterion to be considered is the first passage risk probability criterion. To ensure the non-explosion of the state processes, we first introduce a so-called drift condition, which is weaker than the well known regular condition for semi-Markov decision processes (SMDPs). Furthermore, under some suitable conditions, by value iteration recursive approximation technique, we establish the optimality equation, obtain the uniqueness of the value function and the existence of optimal policies. Finally, two examples are used to illustrate our results.

How to cite

top

Huo, Haifeng, and Wen, Xian. "First passage risk probability optimality for continuous time Markov decision processes." Kybernetika 55.1 (2019): 114-133. <http://eudml.org/doc/294508>.

@article{Huo2019,
abstract = {In this paper, we study continuous time Markov decision processes (CTMDPs) with a denumerable state space, a Borel action space, unbounded transition rates and nonnegative reward function. The optimality criterion to be considered is the first passage risk probability criterion. To ensure the non-explosion of the state processes, we first introduce a so-called drift condition, which is weaker than the well known regular condition for semi-Markov decision processes (SMDPs). Furthermore, under some suitable conditions, by value iteration recursive approximation technique, we establish the optimality equation, obtain the uniqueness of the value function and the existence of optimal policies. Finally, two examples are used to illustrate our results.},
author = {Huo, Haifeng, Wen, Xian},
journal = {Kybernetika},
keywords = {continuous time Markov decision processes; first passage time; risk probability criterion; optimal policy},
language = {eng},
number = {1},
pages = {114-133},
publisher = {Institute of Information Theory and Automation AS CR},
title = {First passage risk probability optimality for continuous time Markov decision processes},
url = {http://eudml.org/doc/294508},
volume = {55},
year = {2019},
}

TY - JOUR
AU - Huo, Haifeng
AU - Wen, Xian
TI - First passage risk probability optimality for continuous time Markov decision processes
JO - Kybernetika
PY - 2019
PB - Institute of Information Theory and Automation AS CR
VL - 55
IS - 1
SP - 114
EP - 133
AB - In this paper, we study continuous time Markov decision processes (CTMDPs) with a denumerable state space, a Borel action space, unbounded transition rates and nonnegative reward function. The optimality criterion to be considered is the first passage risk probability criterion. To ensure the non-explosion of the state processes, we first introduce a so-called drift condition, which is weaker than the well known regular condition for semi-Markov decision processes (SMDPs). Furthermore, under some suitable conditions, by value iteration recursive approximation technique, we establish the optimality equation, obtain the uniqueness of the value function and the existence of optimal policies. Finally, two examples are used to illustrate our results.
LA - eng
KW - continuous time Markov decision processes; first passage time; risk probability criterion; optimal policy
UR - http://eudml.org/doc/294508
ER -

References

top
  1. Bertsekas, D., S.Shreve, Stochastic Optimal Control: The Discrete-Time Case., Academic Press Inc 1996 MR0511544
  2. Bauerle, N., Rieder, U., Markov Decision Processes with Applications to Finance., Springer, Heidelberg 2011 MR2808878
  3. Feinberg, E., 10.1287/moor.1040.0089, Math. Operat. Res. 29 (2004), 492-524. MR2082616DOI10.1287/moor.1040.0089
  4. Guo, X. P., Hernández-Lerma, O., Continuous-Time Markov Decision Process: Theorey and Applications., Springer-Verlag, Berlin 2009. MR2554588
  5. Guo, X. P., Hernández-Del-Valle, A., Hernández-Lerma, O., 10.3166/ejc.18.528-538, Europ. J. Control 18 (2012), 528-538. MR3086896DOI10.3166/ejc.18.528-538
  6. Guo, X. P., Song, X. Y., Zhang, Y., 10.1109/tac.2013.2281475, IEEE Trans. Automat. Control 59 (2014), 163-174. MR3163332DOI10.1109/tac.2013.2281475
  7. Guo, X. P., Huang, X. X., Huang, Y. H., 10.1017/s0001867800049016, Adv. Appl. Prob. 47 (2015), 1064-1087. MR3433296DOI10.1017/s0001867800049016
  8. Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-1-4612-0729-0, Springer-Verlag, New York 1996. MR1363487DOI10.1007/978-1-4612-0729-0
  9. Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-1-4612-0561-6, Springer-Verlag, New York 1999. MR1697198DOI10.1007/978-1-4612-0561-6
  10. Huang, Y. H., Guo, X. P., 10.1016/j.jmaa.2009.05.058, J. Math. Anal. Appl. 359 (2009), 404-420. MR2542184DOI10.1016/j.jmaa.2009.05.058
  11. Huang, Y. H., Guo, X. P., 10.1007/s10255-011-0061-2, Acta. Math. Appl. Sinica 27 (2011), 177-190. MR2784052DOI10.1007/s10255-011-0061-2
  12. Huang, Y. H., Wei, Q. D., Guo, X. P., 10.1007/s10479-012-1292-1, Ann. Oper. Res. 206 (2013), 197-219. MR3073845DOI10.1007/s10479-012-1292-1
  13. Huang, Y. H., Guo, X. P., Li, Z. F., 10.1016/j.jmaa.2013.01.021, J. Math. Anal. Appl. 402 (2013), 378-391. MR3023265DOI10.1016/j.jmaa.2013.01.021
  14. Huang, X. X., Zou, X. L., Guo, X. P., 10.1007/s11425-015-5029-x, Sci. China Math. 58 (2015), 1923-1938. MR3383991DOI10.1007/s11425-015-5029-x
  15. Huang, X. X., Huang, Y. H., 10.14736/kyb-2017-1-0059, Kybernetika 53 (2017), 59-81. MR3638556DOI10.14736/kyb-2017-1-0059
  16. Huo, H. F., Zou, X. L., Guo, X. P., 10.1007/s10626-017-0257-6, Discrete Event Dynamic system: Theory Appl. 27 (2017), 675-699. MR3712415DOI10.1007/s10626-017-0257-6
  17. Janssen, J., Manca, R., Semi-Markov Risk Models For Finance, Insurance, and Reliability., Springer, New York 2006. MR2301626
  18. Lin, Y. L., Tomkins, R. J., Wang, C. L., 10.1007/bf02006119, Acta. Math. Appl. Sinica 10 (1994), 194-212. MR1289720DOI10.1007/bf02006119
  19. Liu, J. Y., Liu, K., Markov decision programming - the first passage model with denumerable state space., Systems Sci. Math. Sci. 5 (1992), 340-351. MR1196196
  20. Liu, J. Y., Huang, S. M., 10.1007/s00245-001-0007-9, Appl. Math. Optim. 43 (2001), 187-201. MR1885696DOI10.1007/s00245-001-0007-9
  21. Ohtsubo, Y., 10.1016/s0096-3003(03)00158-9, Appl. Math. Anal. Comp. 149 (2004), 519-532. MR2033087DOI10.1016/s0096-3003(03)00158-9
  22. Puterman, M. L., Markov Decision Processes: Discrete Stochastic Dynamic Programming MR1270015
  23. Piunovskiy, A., Zhang, Y., 10.1137/10081366x, SIAM J. Control Optim. 49 (2011), 2032-2061. MR2837510DOI10.1137/10081366x
  24. Schäl, M., 10.1007/s00186-005-0445-2, Math. Meth. Oper. Res. 70 (2005), 141-158. MR2226972DOI10.1007/s00186-005-0445-2
  25. Wu, C. B., Lin, Y. L., 10.1006/jmaa.1998.6203, J. Math. Anal. Appl. 231 (1999), 47-57. MR1676741DOI10.1006/jmaa.1998.6203
  26. Wu, X., Guo, X. P., 10.1017/s0021900200012560, J. Appl. Prob. 52 (2015), 441-456. MR3372085DOI10.1017/s0021900200012560
  27. Yu, S. X., Lin, Y. L., Yan, P. F., 10.1006/jmaa.1998.6015, J. Math. Anal. Appl. 225 (1998), 193-223. MR1639236DOI10.1006/jmaa.1998.6015
  28. Zou, X. L., Guo, X. P., 10.14736/kyb-2015-2-0276, Kybernetika 51 (2015), 276-292. MR3350562DOI10.14736/kyb-2015-2-0276

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.