Mean-variance optimality for semi-Markov decision processes under first passage criteria
Xiangxiang Huang; Yonghui Huang
Kybernetika (2017)
- Volume: 53, Issue: 1, page 59-81
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topHuang, Xiangxiang, and Huang, Yonghui. "Mean-variance optimality for semi-Markov decision processes under first passage criteria." Kybernetika 53.1 (2017): 59-81. <http://eudml.org/doc/287948>.
@article{Huang2017,
abstract = {This paper deals with a first passage mean-variance problem for semi-Markov decision processes in Borel spaces. The goal is to minimize the variance of a total discounted reward up to the system's first entry to some target set, where the optimization is over a class of policies with a prescribed expected first passage reward. The reward rates are assumed to be possibly unbounded, while the discount factor may vary with states of the system and controls. We first develop some suitable conditions for the existence of first passage mean-variance optimal policies and provide a policy improvement algorithm for computing an optimal policy. Then, two examples are included to illustrate our results. At last, we show how the results here are reduced to the cases of discrete-time Markov decision processes and continuous-time Markov decision processes.},
author = {Huang, Xiangxiang, Huang, Yonghui},
journal = {Kybernetika},
keywords = {semi-Markov decision processes; first passage time; unbounded reward rate; minimal variance; mean-variance optimal policy},
language = {eng},
number = {1},
pages = {59-81},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Mean-variance optimality for semi-Markov decision processes under first passage criteria},
url = {http://eudml.org/doc/287948},
volume = {53},
year = {2017},
}
TY - JOUR
AU - Huang, Xiangxiang
AU - Huang, Yonghui
TI - Mean-variance optimality for semi-Markov decision processes under first passage criteria
JO - Kybernetika
PY - 2017
PB - Institute of Information Theory and Automation AS CR
VL - 53
IS - 1
SP - 59
EP - 81
AB - This paper deals with a first passage mean-variance problem for semi-Markov decision processes in Borel spaces. The goal is to minimize the variance of a total discounted reward up to the system's first entry to some target set, where the optimization is over a class of policies with a prescribed expected first passage reward. The reward rates are assumed to be possibly unbounded, while the discount factor may vary with states of the system and controls. We first develop some suitable conditions for the existence of first passage mean-variance optimal policies and provide a policy improvement algorithm for computing an optimal policy. Then, two examples are included to illustrate our results. At last, we show how the results here are reduced to the cases of discrete-time Markov decision processes and continuous-time Markov decision processes.
LA - eng
KW - semi-Markov decision processes; first passage time; unbounded reward rate; minimal variance; mean-variance optimal policy
UR - http://eudml.org/doc/287948
ER -
References
top- Berument, H., Kilinc, Z., Ozlale, U., 10.1016/j.physa.2003.10.039, Phys. A 333 (2004), 317-324. MR2100223DOI10.1016/j.physa.2003.10.039
- Baykal-Gürsoy, M., Gürsoy, K., 10.1017/S026996480700037X, Probab. Engrg. Inform. Sci. 21 (2007), 635-657. MR2357126DOI10.1017/S026996480700037X
- Bäuerle, N., Rieder, U., 10.1007/978-3-642-18324-9, In: Universitext, Springer, Heidelberg 2011. Zbl1236.90004MR2808878DOI10.1007/978-3-642-18324-9
- Collins, E., 10.1007/s002910050017, OR Spektrum 19 (1997), 35-39. Zbl0894.90161MR1464393DOI10.1007/s002910050017
- Costa, O. L. V., Maiali, A. C., Pinto, A. de C., 10.1109/tac.2010.2046923, IEEE Trans. Automat. Control 55 (2010), 1704-1709. MR2675836DOI10.1109/tac.2010.2046923
- Filar, J. A., Kallenberg, L. C. M., Lee, H. M., 10.1287/moor.14.1.147, Math. Oper. Res. 14 (1989), 147-161. Zbl0676.90096MR0984562DOI10.1287/moor.14.1.147
- Fu, C. P., Lari-Lavassani, A., Li, X., 10.1016/j.ejor.2009.01.005, European J. Oper. Res. 200 (2010), 312-319. Zbl1183.91192MR2561109DOI10.1016/j.ejor.2009.01.005
- Guo, X. P., Hernández-Lerma, O., 10.1007/978-3-642-02547-1, Springer-Verlag, Berlin 2009. Zbl1209.90002MR2554588DOI10.1007/978-3-642-02547-1
- Guo, X. P., Song, X. Y., 10.1109/tac.2009.2023833, IEEE Trans. Automat. Control 54 (2009), 2151-2157. MR2567941DOI10.1109/tac.2009.2023833
- Guo, X. P., Ye, L. E., Yin, G., 10.1016/j.ejor.2012.01.051, European J. Oper. Res. 220 (2012), 423-429. Zbl1253.90214MR2908853DOI10.1016/j.ejor.2012.01.051
- Guo, X. P., Huang, X. X., Zhang, Y., 10.1137/140968872, SIAM J. Control Optim. 53 (2015), 1406-1424. Zbl1322.90108MR3352600DOI10.1137/140968872
- Hu, Q. Y., 10.1006/jmaa.1996.9999, J. Math. Anal. Appl. 203 (1996), 1-12. Zbl0858.90135MR1412477DOI10.1006/jmaa.1996.9999
- Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-1-4612-0561-6, Springer-Verlag, New York 1999. Zbl0928.93002MR1697198DOI10.1007/978-1-4612-0561-6
- Hernández-Lerma, O., Vega-Amaya, O., Carrasco, G., 10.1137/S0363012998340673, SIAM J. Control Optim. 38 (1999), 79-93. Zbl0951.93074MR1740606DOI10.1137/S0363012998340673
- Haberman, S., Sung, J. H., 10.1016/j.insmatheco.2004.10.006, Insurance Math. Econom. 36 (2005), 103-116. Zbl1111.91023MR2122668DOI10.1016/j.insmatheco.2004.10.006
- Huang, Y. H., Guo, X. P., 10.1007/s10255-011-0061-2, Acta Math. Appl. Sin. Engl. Ser. 27 (2011), 177-190. Zbl1235.90177MR2784052DOI10.1007/s10255-011-0061-2
- Huang, Y. H., Guo, X. P., Song, X. Y., 10.1007/s10957-011-9813-7, J. Optim. Theory Appl. 150 (2011), 395-415. Zbl1222.90076MR2818928DOI10.1007/s10957-011-9813-7
- Huang, Y. H., Guo, X. P., 10.1007/978-0-8176-8337-5_11, Optimization, Control, and Applications of Stochastic Systems, pp. 181-202, Systems Control Found. Appl., Birkhäuser/Springer, New York 2012. MR2961386DOI10.1007/978-0-8176-8337-5_11
- Huang, Y. H., Guo, X. P., 10.1007/s00245-014-9278-9, Appl. Math. Optim. 72 (2015), 233-259. Zbl1343.93100MR3394396DOI10.1007/s00245-014-9278-9
- Jaquette, S. C., 10.1214/aos/1176343087, Ann. Statist. 3 (1975), 547-553. Zbl0321.90051MR0363493DOI10.1214/aos/1176343087
- Kurano, M., 10.1016/0022-247x(87)90332-5, J. Math. Anal. Appl. 123 (1987), 572-583. Zbl0619.90080MR0883710DOI10.1016/0022-247x(87)90332-5
- Kharroubi, I., Lim, T., 10.1007/s00245-013-9213-5, Appl. Math. Optim. 68 (2013), 413-444. MR3131502DOI10.1007/s00245-013-9213-5
- Lee, M. J., Li, W. J., 10.1016/j.econlet.2004.09.002, Econom. Lett. 86 (2005), 339-346. Zbl1254.91733MR2124417DOI10.1016/j.econlet.2004.09.002
- Mandl, P., On the variance in controlled Markov chains., Kybernetika 7 (1971), 1-12. Zbl0215.25902MR0286178
- Mannor, S., Tsitsiklis, J. N., 10.1016/j.ejor.2013.06.019, European J. Oper. Res. 231 (2013), 645-653. Zbl1317.90318MR3092864DOI10.1016/j.ejor.2013.06.019
- Markowitz, H. M., Portfolio Selection: Efficient Diversification of Investments., John Wiley and Sons, Inc., New York 1959. MR0103768
- Prieto-Rumeau, T., Hernández-Lerma, O., 10.1007/s00186-008-0276-z, Math. Methods Oper. Res. 70 (2009), 527-540. Zbl1177.93101MR2558430DOI10.1007/s00186-008-0276-z
- Sobel, M. J., 10.1017/s0021900200023123, J. Appl. Probab. 19 (1982), 794-802. Zbl0503.90091MR0675143DOI10.1017/s0021900200023123
- White, D. J., 10.1007/bf01720350, OR Spektrum 14 (1992), 79-83. Zbl0768.90087MR1175342DOI10.1007/bf01720350
- Wu, X., Guo, X. P., 10.1017/s0021900200012560, J. Appl. Probab. 52 (2015), 441-456. Zbl1327.90374MR3372085DOI10.1017/s0021900200012560
- Zhou, X. Y., Yin, G., 10.1137/s0363012902405583, SIAM J. Control Optim. 42 (2003), 1466-1482. Zbl1175.91169MR2044805DOI10.1137/s0363012902405583
- Zhu, Q. X., Guo, X. P., 10.1080/07362990701282807, Stoch. Anal. Appl. 25 (2007), 577-592. Zbl1152.90646MR2321898DOI10.1080/07362990701282807
Citations in EuDML Documents
topNotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.