Strong average optimality criterion for continuous-time Markov decision processes

Qingda Wei; Xian Chen

Strong average optimality criterion for continuous-time Markov decision processes

Qingda Wei; Xian Chen

Kybernetika (2014)

Volume: 50, Issue: 6, page 950-977
ISSN: 0023-5954

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

This paper deals with continuous-time Markov decision processes with the unbounded transition rates under the strong average cost criterion. The state and action spaces are Borel spaces, and the costs are allowed to be unbounded from above and from below. Under mild conditions, we first prove that the finite-horizon optimal value function is a solution to the optimality equation for the case of uncountable state spaces and unbounded transition rates, and that there exists an optimal deterministic Markov policy. Then, using the two average optimality inequalities, we show that the set of all strong average optimal policies coincides with the set of all average optimal policies, and thus obtain the existence of strong average optimal policies. Furthermore, employing the technique of the skeleton chains of controlled continuous-time Markov chains and Chapman-Kolmogorov equation, we give a new set of sufficient conditions imposed on the primitive data of the model for the verification of the uniform exponential ergodicity of continuous-time Markov chains governed by stationary policies. Finally, we illustrate our main results with an example.

How to cite

top

MLA
BibTeX
RIS

Wei, Qingda, and Chen, Xian. "Strong average optimality criterion for continuous-time Markov decision processes." Kybernetika 50.6 (2014): 950-977. <http://eudml.org/doc/262139>.

@article{Wei2014,
abstract = {This paper deals with continuous-time Markov decision processes with the unbounded transition rates under the strong average cost criterion. The state and action spaces are Borel spaces, and the costs are allowed to be unbounded from above and from below. Under mild conditions, we first prove that the finite-horizon optimal value function is a solution to the optimality equation for the case of uncountable state spaces and unbounded transition rates, and that there exists an optimal deterministic Markov policy. Then, using the two average optimality inequalities, we show that the set of all strong average optimal policies coincides with the set of all average optimal policies, and thus obtain the existence of strong average optimal policies. Furthermore, employing the technique of the skeleton chains of controlled continuous-time Markov chains and Chapman-Kolmogorov equation, we give a new set of sufficient conditions imposed on the primitive data of the model for the verification of the uniform exponential ergodicity of continuous-time Markov chains governed by stationary policies. Finally, we illustrate our main results with an example.},
author = {Wei, Qingda, Chen, Xian},
journal = {Kybernetika},
keywords = {continuous-time Markov decision processes; strong average optimality criterion; finite-horizon expected total cost criterion; unbounded transition rates; optimal policy; optimal value function; continuous-time Markov decision processes; strong average optimality criterion; finite-horizon expected total cost criterion; unbounded transition rates; optimal policy; optimal value function},
language = {eng},
number = {6},
pages = {950-977},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Strong average optimality criterion for continuous-time Markov decision processes},
url = {http://eudml.org/doc/262139},
volume = {50},
year = {2014},
}

TY - JOUR
AU - Wei, Qingda
AU - Chen, Xian
TI - Strong average optimality criterion for continuous-time Markov decision processes
JO - Kybernetika
PY - 2014
PB - Institute of Information Theory and Automation AS CR
VL - 50
IS - 6
SP - 950
EP - 977
AB - This paper deals with continuous-time Markov decision processes with the unbounded transition rates under the strong average cost criterion. The state and action spaces are Borel spaces, and the costs are allowed to be unbounded from above and from below. Under mild conditions, we first prove that the finite-horizon optimal value function is a solution to the optimality equation for the case of uncountable state spaces and unbounded transition rates, and that there exists an optimal deterministic Markov policy. Then, using the two average optimality inequalities, we show that the set of all strong average optimal policies coincides with the set of all average optimal policies, and thus obtain the existence of strong average optimal policies. Furthermore, employing the technique of the skeleton chains of controlled continuous-time Markov chains and Chapman-Kolmogorov equation, we give a new set of sufficient conditions imposed on the primitive data of the model for the verification of the uniform exponential ergodicity of continuous-time Markov chains governed by stationary policies. Finally, we illustrate our main results with an example.
LA - eng
KW - continuous-time Markov decision processes; strong average optimality criterion; finite-horizon expected total cost criterion; unbounded transition rates; optimal policy; optimal value function; continuous-time Markov decision processes; strong average optimality criterion; finite-horizon expected total cost criterion; unbounded transition rates; optimal policy; optimal value function
UR - http://eudml.org/doc/262139
ER -

References

top

Bäuerle, N., Rieder, U., Markov Decision Processes with Applications to Finance., Springer, Berlin 2011. Zbl1236.90004 MR2808878
Bertsekas, D. P., Shreve, S. E., Stochastic Optimal Control: The Discrete-time Case., Academic Press, New York 1978. Zbl0633.93001 MR0511544
Cavazos-Cadena, R., Fernández-Gaucherand, E., 10.1007/BF01194549, Math. Methods Oper. Res. 43 (1996), 281-300. Zbl0851.90135 MR1398350 DOI10.1007/BF01194549
Dijk, N. M. van, On the finite horizon Bellman equation for controlled Markov jump models with unbounded characteristics: existence and approximation., Stochastic Process. Appl. 28 (1988), 141-157. MR0936380
Dynkin, E. B., Yushkevich, A. A., Controlled Markov Processes., Springer, New York 1979. MR0554083
Feller, W., 10.1090/S0002-9947-1940-0002697-3, Trans. Amer. Math. Soc. 48 (1940), 488-515. Zbl0025.34704 MR0002697 DOI10.1090/S0002-9947-1940-0002697-3
Flynn, J., 10.1016/0022-247X(80)90072-4, J. Math. Anal. Appl. 76 (1980), 202-208. Zbl0438.90100 MR0586657 DOI10.1016/0022-247X(80)90072-4
Ghosh, M. K., Marcus, S. I., 10.1016/0167-6377(92)90040-A, Oper. Res. Lett. 11 (1992), 99-104. Zbl0768.90085 MR1167429 DOI10.1016/0167-6377(92)90040-A
Ghosh, M. K., Saha, S., Continuous-time controlled jump Markov processes on the finite horizon., In: Optimization, Control, and Applications of Stochastic Systems (D. Hernández-Hernández and J. A. Minjárez-Sosa, eds.), Springer, New York 2012, pp. 99-109. MR2961381
Gihman, I. I., Skohorod, A. V., Controlled Stochastic Processes., Springer, Berlin 1979. MR0544839
Guo, X. P., Rieder, U., 10.1214/105051606000000105, Ann. Appl. Probab. 16 (2006), 730-756. Zbl1160.90010 MR2244431 DOI10.1214/105051606000000105
Guo, X. P., 10.1287/moor.1060.0210, Math. Oper. Res. 32 (2007), 73-87. Zbl1278.90426 MR2292498 DOI10.1287/moor.1060.0210
Guo, X. P., Hernández-Lerma, O., Continuous-Time Markov Decision Processes: Theory and Applications., Springer, Berlin 2009. Zbl1209.90002
Guo, X.P., Ye, L. E., 10.1239/aap/1293113146, Adv. in Appl. Probab. 42 (2010), 953-985. MR2796672 DOI10.1239/aap/1293113146
Hernández-Lerma, O., Lasserre, J. B., Discrete-Time Markov Control Processes: Basic Optimality Criteria., Springer, New York 1996. Zbl0840.93001 MR1363487
Hernández-Lerma, O., Lasserre, J. B., Further Topics on Discrete-Time Markov Control Processes., Springer, New York 1999. Zbl0928.93002 MR1697198
Meyn, S. P., Tweedie, R. L., 10.1214/aoap/1177004900, Ann. Appl. Probab. 4 (1994), 981-1011. Zbl0812.60059 MR1304770 DOI10.1214/aoap/1177004900
Miller, B. L., 10.1137/0306020, SIAM J. Control 6 (1968), 266-280. MR0241153 DOI10.1137/0306020
Pliska, S. R., Controlled jump processes., Stochastic Process. Appl. 3 (1975), 259-282. Zbl0313.60055 MR0406531
Puterman, M. L., Markov Decision Processes: Discrete Stochastic Dynamic Programming., Wiley, New York 1994. Zbl1184.90170 MR1270015
Ye, L. E., Guo, X. P., 10.1007/s00186-010-0307-4, Math. Methods Oper. Res. 72 (2010), 75-94. Zbl1203.90176 MR2678707 DOI10.1007/s00186-010-0307-4
Yushkevich, A. A., 10.1137/1125034, Theory Probab. Appl. 25 (1980), 244-266. Zbl0458.90078 DOI10.1137/1125034
Zhu, Q. X., 10.1007/s00186-007-0157-x, Math. Methods Oper. Res. 66 (2007), 299-313. Zbl1138.90038 MR2342216 DOI10.1007/s00186-007-0157-x
Zhu, Q.X., 10.1016/j.jmaa.2007.06.071, J. Math. Anal. Appl. 339 (2008), 691-704. Zbl1156.90023 MR2370686 DOI10.1016/j.jmaa.2007.06.071

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.