Mean-variance optimality for semi-Markov decision processes under first passage criteria

Xiangxiang Huang; Yonghui Huang

Kybernetika (2017)

  • Volume: 53, Issue: 1, page 59-81
  • ISSN: 0023-5954

Abstract

top
This paper deals with a first passage mean-variance problem for semi-Markov decision processes in Borel spaces. The goal is to minimize the variance of a total discounted reward up to the system's first entry to some target set, where the optimization is over a class of policies with a prescribed expected first passage reward. The reward rates are assumed to be possibly unbounded, while the discount factor may vary with states of the system and controls. We first develop some suitable conditions for the existence of first passage mean-variance optimal policies and provide a policy improvement algorithm for computing an optimal policy. Then, two examples are included to illustrate our results. At last, we show how the results here are reduced to the cases of discrete-time Markov decision processes and continuous-time Markov decision processes.

How to cite

top

Huang, Xiangxiang, and Huang, Yonghui. "Mean-variance optimality for semi-Markov decision processes under first passage criteria." Kybernetika 53.1 (2017): 59-81. <http://eudml.org/doc/287948>.

@article{Huang2017,
abstract = {This paper deals with a first passage mean-variance problem for semi-Markov decision processes in Borel spaces. The goal is to minimize the variance of a total discounted reward up to the system's first entry to some target set, where the optimization is over a class of policies with a prescribed expected first passage reward. The reward rates are assumed to be possibly unbounded, while the discount factor may vary with states of the system and controls. We first develop some suitable conditions for the existence of first passage mean-variance optimal policies and provide a policy improvement algorithm for computing an optimal policy. Then, two examples are included to illustrate our results. At last, we show how the results here are reduced to the cases of discrete-time Markov decision processes and continuous-time Markov decision processes.},
author = {Huang, Xiangxiang, Huang, Yonghui},
journal = {Kybernetika},
keywords = {semi-Markov decision processes; first passage time; unbounded reward rate; minimal variance; mean-variance optimal policy},
language = {eng},
number = {1},
pages = {59-81},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Mean-variance optimality for semi-Markov decision processes under first passage criteria},
url = {http://eudml.org/doc/287948},
volume = {53},
year = {2017},
}

TY - JOUR
AU - Huang, Xiangxiang
AU - Huang, Yonghui
TI - Mean-variance optimality for semi-Markov decision processes under first passage criteria
JO - Kybernetika
PY - 2017
PB - Institute of Information Theory and Automation AS CR
VL - 53
IS - 1
SP - 59
EP - 81
AB - This paper deals with a first passage mean-variance problem for semi-Markov decision processes in Borel spaces. The goal is to minimize the variance of a total discounted reward up to the system's first entry to some target set, where the optimization is over a class of policies with a prescribed expected first passage reward. The reward rates are assumed to be possibly unbounded, while the discount factor may vary with states of the system and controls. We first develop some suitable conditions for the existence of first passage mean-variance optimal policies and provide a policy improvement algorithm for computing an optimal policy. Then, two examples are included to illustrate our results. At last, we show how the results here are reduced to the cases of discrete-time Markov decision processes and continuous-time Markov decision processes.
LA - eng
KW - semi-Markov decision processes; first passage time; unbounded reward rate; minimal variance; mean-variance optimal policy
UR - http://eudml.org/doc/287948
ER -

References

top
  1. Berument, H., Kilinc, Z., Ozlale, U., 10.1016/j.physa.2003.10.039, Phys. A 333 (2004), 317-324. MR2100223DOI10.1016/j.physa.2003.10.039
  2. Baykal-Gürsoy, M., Gürsoy, K., 10.1017/S026996480700037X, Probab. Engrg. Inform. Sci. 21 (2007), 635-657. MR2357126DOI10.1017/S026996480700037X
  3. Bäuerle, N., Rieder, U., 10.1007/978-3-642-18324-9, In: Universitext, Springer, Heidelberg 2011. Zbl1236.90004MR2808878DOI10.1007/978-3-642-18324-9
  4. Collins, E., 10.1007/s002910050017, OR Spektrum 19 (1997), 35-39. Zbl0894.90161MR1464393DOI10.1007/s002910050017
  5. Costa, O. L. V., Maiali, A. C., Pinto, A. de C., 10.1109/tac.2010.2046923, IEEE Trans. Automat. Control 55 (2010), 1704-1709. MR2675836DOI10.1109/tac.2010.2046923
  6. Filar, J. A., Kallenberg, L. C. M., Lee, H. M., 10.1287/moor.14.1.147, Math. Oper. Res. 14 (1989), 147-161. Zbl0676.90096MR0984562DOI10.1287/moor.14.1.147
  7. Fu, C. P., Lari-Lavassani, A., Li, X., 10.1016/j.ejor.2009.01.005, European J. Oper. Res. 200 (2010), 312-319. Zbl1183.91192MR2561109DOI10.1016/j.ejor.2009.01.005
  8. Guo, X. P., Hernández-Lerma, O., 10.1007/978-3-642-02547-1, Springer-Verlag, Berlin 2009. Zbl1209.90002MR2554588DOI10.1007/978-3-642-02547-1
  9. Guo, X. P., Song, X. Y., 10.1109/tac.2009.2023833, IEEE Trans. Automat. Control 54 (2009), 2151-2157. MR2567941DOI10.1109/tac.2009.2023833
  10. Guo, X. P., Ye, L. E., Yin, G., 10.1016/j.ejor.2012.01.051, European J. Oper. Res. 220 (2012), 423-429. Zbl1253.90214MR2908853DOI10.1016/j.ejor.2012.01.051
  11. Guo, X. P., Huang, X. X., Zhang, Y., 10.1137/140968872, SIAM J. Control Optim. 53 (2015), 1406-1424. Zbl1322.90108MR3352600DOI10.1137/140968872
  12. Hu, Q. Y., 10.1006/jmaa.1996.9999, J. Math. Anal. Appl. 203 (1996), 1-12. Zbl0858.90135MR1412477DOI10.1006/jmaa.1996.9999
  13. Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-1-4612-0561-6, Springer-Verlag, New York 1999. Zbl0928.93002MR1697198DOI10.1007/978-1-4612-0561-6
  14. Hernández-Lerma, O., Vega-Amaya, O., Carrasco, G., 10.1137/S0363012998340673, SIAM J. Control Optim. 38 (1999), 79-93. Zbl0951.93074MR1740606DOI10.1137/S0363012998340673
  15. Haberman, S., Sung, J. H., 10.1016/j.insmatheco.2004.10.006, Insurance Math. Econom. 36 (2005), 103-116. Zbl1111.91023MR2122668DOI10.1016/j.insmatheco.2004.10.006
  16. Huang, Y. H., Guo, X. P., 10.1007/s10255-011-0061-2, Acta Math. Appl. Sin. Engl. Ser. 27 (2011), 177-190. Zbl1235.90177MR2784052DOI10.1007/s10255-011-0061-2
  17. Huang, Y. H., Guo, X. P., Song, X. Y., 10.1007/s10957-011-9813-7, J. Optim. Theory Appl. 150 (2011), 395-415. Zbl1222.90076MR2818928DOI10.1007/s10957-011-9813-7
  18. Huang, Y. H., Guo, X. P., 10.1007/978-0-8176-8337-5_11, Optimization, Control, and Applications of Stochastic Systems, pp. 181-202, Systems Control Found. Appl., Birkhäuser/Springer, New York 2012. MR2961386DOI10.1007/978-0-8176-8337-5_11
  19. Huang, Y. H., Guo, X. P., 10.1007/s00245-014-9278-9, Appl. Math. Optim. 72 (2015), 233-259. Zbl1343.93100MR3394396DOI10.1007/s00245-014-9278-9
  20. Jaquette, S. C., 10.1214/aos/1176343087, Ann. Statist. 3 (1975), 547-553. Zbl0321.90051MR0363493DOI10.1214/aos/1176343087
  21. Kurano, M., 10.1016/0022-247x(87)90332-5, J. Math. Anal. Appl. 123 (1987), 572-583. Zbl0619.90080MR0883710DOI10.1016/0022-247x(87)90332-5
  22. Kharroubi, I., Lim, T., 10.1007/s00245-013-9213-5, Appl. Math. Optim. 68 (2013), 413-444. MR3131502DOI10.1007/s00245-013-9213-5
  23. Lee, M. J., Li, W. J., 10.1016/j.econlet.2004.09.002, Econom. Lett. 86 (2005), 339-346. Zbl1254.91733MR2124417DOI10.1016/j.econlet.2004.09.002
  24. Mandl, P., On the variance in controlled Markov chains., Kybernetika 7 (1971), 1-12. Zbl0215.25902MR0286178
  25. Mannor, S., Tsitsiklis, J. N., 10.1016/j.ejor.2013.06.019, European J. Oper. Res. 231 (2013), 645-653. Zbl1317.90318MR3092864DOI10.1016/j.ejor.2013.06.019
  26. Markowitz, H. M., Portfolio Selection: Efficient Diversification of Investments., John Wiley and Sons, Inc., New York 1959. MR0103768
  27. Prieto-Rumeau, T., Hernández-Lerma, O., 10.1007/s00186-008-0276-z, Math. Methods Oper. Res. 70 (2009), 527-540. Zbl1177.93101MR2558430DOI10.1007/s00186-008-0276-z
  28. Sobel, M. J., 10.1017/s0021900200023123, J. Appl. Probab. 19 (1982), 794-802. Zbl0503.90091MR0675143DOI10.1017/s0021900200023123
  29. White, D. J., 10.1007/bf01720350, OR Spektrum 14 (1992), 79-83. Zbl0768.90087MR1175342DOI10.1007/bf01720350
  30. Wu, X., Guo, X. P., 10.1017/s0021900200012560, J. Appl. Probab. 52 (2015), 441-456. Zbl1327.90374MR3372085DOI10.1017/s0021900200012560
  31. Zhou, X. Y., Yin, G., 10.1137/s0363012902405583, SIAM J. Control Optim. 42 (2003), 1466-1482. Zbl1175.91169MR2044805DOI10.1137/s0363012902405583
  32. Zhu, Q. X., Guo, X. P., 10.1080/07362990701282807, Stoch. Anal. Appl. 25 (2007), 577-592. Zbl1152.90646MR2321898DOI10.1080/07362990701282807

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.