Markov decision processes with time-varying discount factors and random horizon
Rocio Ilhuicatzi-Roldán; Hugo Cruz-Suárez; Selene Chávez-Rodríguez
Kybernetika (2017)
- Volume: 53, Issue: 1, page 82-98
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topIlhuicatzi-Roldán, Rocio, Cruz-Suárez, Hugo, and Chávez-Rodríguez, Selene. "Markov decision processes with time-varying discount factors and random horizon." Kybernetika 53.1 (2017): 82-98. <http://eudml.org/doc/287959>.
@article{Ilhuicatzi2017,
abstract = {This paper is related to Markov Decision Processes. The optimal control problem is to minimize the expected total discounted cost, with a non-constant discount factor. The discount factor is time-varying and it could depend on the state and the action. Furthermore, it is considered that the horizon of the optimization problem is given by a discrete random variable, that is, a random horizon is assumed. Under general conditions on Markov control model, using the dynamic programming approach, an optimality equation for both cases is obtained, namely, finite support and infinite support of the random horizon. The obtained results are illustrated by two examples, one of them related to optimal replacement.},
author = {Ilhuicatzi-Roldán, Rocio, Cruz-Suárez, Hugo, Chávez-Rodríguez, Selene},
journal = {Kybernetika},
keywords = {Markov decision process; dynamic programming; varying discount factor; random horizon},
language = {eng},
number = {1},
pages = {82-98},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Markov decision processes with time-varying discount factors and random horizon},
url = {http://eudml.org/doc/287959},
volume = {53},
year = {2017},
}
TY - JOUR
AU - Ilhuicatzi-Roldán, Rocio
AU - Cruz-Suárez, Hugo
AU - Chávez-Rodríguez, Selene
TI - Markov decision processes with time-varying discount factors and random horizon
JO - Kybernetika
PY - 2017
PB - Institute of Information Theory and Automation AS CR
VL - 53
IS - 1
SP - 82
EP - 98
AB - This paper is related to Markov Decision Processes. The optimal control problem is to minimize the expected total discounted cost, with a non-constant discount factor. The discount factor is time-varying and it could depend on the state and the action. Furthermore, it is considered that the horizon of the optimization problem is given by a discrete random variable, that is, a random horizon is assumed. Under general conditions on Markov control model, using the dynamic programming approach, an optimality equation for both cases is obtained, namely, finite support and infinite support of the random horizon. The obtained results are illustrated by two examples, one of them related to optimal replacement.
LA - eng
KW - Markov decision process; dynamic programming; varying discount factor; random horizon
UR - http://eudml.org/doc/287959
ER -
References
top- Carmon, Y., Shwartz, A., 10.1016/j.orl.2008.10.005, Oper. Res. Lett. 37 (2009), 51-55. Zbl1154.90610MR2488083DOI10.1016/j.orl.2008.10.005
- Chen, X., Yang, X., 10.1016/j.insmatheco.2015.01.004, Insurance Math. Econom. 61 (2015), 197-205. Zbl1314.91192MR3324056DOI10.1016/j.insmatheco.2015.01.004
- Cruz-Suárez, H., Ilhuicatzi-Roldán, R., Montes-de-Oca, R., 10.1007/s10957-012-0262-8, J. Optim. Theory Appl. 162 (2014), 329-346. Zbl1317.90316MR3228530DOI10.1007/s10957-012-0262-8
- Vecchia, E. Della, Marco, S. Di, Vidal, F., Dynamic programming for variable discounted Markov decision problems., In: Jornadas Argentinas de Informática e Investigación Operativa (43JAIIO) XII Simposio Argentino de Investigación Operativa (SIO), Buenos Aires 2014, pp. 50-62.
- Feinberg, E., Shwartz, A., 10.1109/9.751365, IEEE Trans. Automat. Control 44 (1999), 628-631. Zbl0957.90127MR1680195DOI10.1109/9.751365
- Feinberg, E., Shwartz, A., 10.1287/moor.19.1.152, Math. Oper. Res. 19 (1994), 152-168. Zbl0803.90123MR1290017DOI10.1287/moor.19.1.152
- García, Y. H., González-Hernández, J., 10.14736/kyb-2016-3-0403, Kybernetika 52 (2016), 403-426. MR3532514DOI10.14736/kyb-2016-3-0403
- González-Hernández, J., López-Martínez, R. R., Minjarez-Sosa, J. A., Adaptive policies for stochastic systems under a randomized discounted criterion., Bol. Soc. Mat. Mex. 14 (2008), 149-163. MR2667162
- González-Hernández, J., López-Martínez, R. R., Minjarez-Sosa, J. A., Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion., Kybernetika 45 (2009), 737-754. Zbl1190.93105MR2599109
- González-Hernández, J., López-Martínez, R. R., Minjarez-Sosa, J. A., Gabriel-Arguelles, J. A., Constrained Markov control processes with randomized discounted cost criteria: occupation measures and external points., Risk and Decision Analysis 4 (2013), 163-176.
- González-Hernández, J., López-Martínez, R. R., Minjarez-Sosa, J. A., Gabriel-Arguelles, J. A., 10.1002/oca.2089, Optimal Control Appl. Methods 35 (2014), 575-591. MR3262763DOI10.1002/oca.2089
- González-Hernández, J., López-Martínez, R. R., Pérez-Hernández, J. R., 10.1007/s00186-006-0092-2, Math. Methods Oper. Res. 65 (2007), 27-44. Zbl1126.90075MR2302022DOI10.1007/s00186-006-0092-2
- Guo, X., Hernández-del-Valle, A., Hernández-Lerma, O., 10.3166/ejc.18.528-538, Eur. J. Control 18 (2012), 528-538. Zbl1291.93328MR3086896DOI10.3166/ejc.18.528-538
- Hernández-Lerma, O., Laserre, J. B., 10.1007/978-1-4612-0729-0, Springer-Verlag, New York 1996. MR1363487DOI10.1007/978-1-4612-0729-0
- Hinderer, K., 10.1007/978-3-642-46229-0, In: Lectures Notes Operations Research (M. Bechmann and H. Künzi, eds.), Springer-Verlag 33, Zürich 1970. Zbl0202.18401MR0267890DOI10.1007/978-3-642-46229-0
- Ilhuicatzi-Roldán, R., Cruz-Suárez, H., 10.4067/s0716-09172012000300003, Proyecciones 31 (2012), 219-233. Zbl1262.90050MR2995551DOI10.4067/s0716-09172012000300003
- Minjares-Sosa, J. A., 10.1007/s11750-015-0360-5, TOP 23 (2015), 743-772. MR3407674DOI10.1007/s11750-015-0360-5
- Puterman, M. L., Markov Decision Process: Discrete Stochastic Dynamic Programming., John Wiley and Sons, New York 1994. MR1270015
- Sch{ä}l, M., 10.1007/bf00532612, Probab. Theory Related Fields 32 (1975), 179-196. Zbl0316.90080MR0378841DOI10.1007/bf00532612
- Wei, Q., Guo, X., 10.1016/j.orl.2011.06.014, Oper. Res. Lett. 39 (2011), 369-374. MR2835530DOI10.1016/j.orl.2011.06.014
- Wei, Q., Guo, X., 10.1007/s10288-014-0267-2, 4OR, 13 (2015), 59-79. Zbl1310.93087MR3323274DOI10.1007/s10288-014-0267-2
- Wu, X., Guo, X., 10.1017/s0021900200012560, J. Appl. Probab. 52 (2015), 441-456. MR3372085DOI10.1017/s0021900200012560
- Wu, X., Zou, X., Guo, X., 10.1007/s11464-015-0479-6, Front. Math. China 10 (2015), 1005-1023. Zbl1317.90319MR3352898DOI10.1007/s11464-015-0479-6
- Wu, X., Zhang, J., 10.1109/wcica.2014.7052984, In: Proc. 11th World Congress on Intelligent Control and Automation 2015, pp. 1745-1748. MR3163332DOI10.1109/wcica.2014.7052984
- Wu, X., Zhang, J., 10.1007/s10626-014-0209-3, Discrete Event Dyn. Syst. 26 (2016), 669-683. MR3557415DOI10.1007/s10626-014-0209-3
- Ye, L., Guo, X., 10.1007/s10440-012-9669-3, Acta Appl. Math. 121 (2012), 5-27. Zbl1281.90082MR2966962DOI10.1007/s10440-012-9669-3
- Zhang, Y., 10.1007/s11750-011-0186-8, TOP 21 (2013), 378-408. Zbl1273.90235MR3068494DOI10.1007/s11750-011-0186-8
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.