Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs

Beatris A. Escobedo-Trujillo; Carmen G. Higuera-Chan

Kybernetika (2019)

  • Volume: 55, Issue: 1, page 166-182
  • ISSN: 0023-5954

Abstract

top
In this paper we are concerned with a class of time-varying discounted Markov decision models n with unbounded costs c n and state-action dependent discount factors. Specifically we study controlled systems whose state process evolves according to the equation x n + 1 = G n ( x n , a n , ξ n ) , n = 0 , 1 , ... , with state-action dependent discount factors of the form α n ( x n , a n ) , where a n and ξ n are the control and the random disturbance at time n , respectively. Assuming that the sequences of functions { α n } , { c n } and { G n } converge, in certain sense, to α , c and G , our objective is to introduce a suitable control model for this class of systems and then, to show the existence of optimal policies for the limit system corresponding to α , c and G . Finally, we illustrate our results and their applicability in a class of semi-Markov control models.

How to cite

top

Escobedo-Trujillo, Beatris A., and Higuera-Chan, Carmen G.. "Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs." Kybernetika 55.1 (2019): 166-182. <http://eudml.org/doc/294596>.

@article{Escobedo2019,
abstract = {In this paper we are concerned with a class of time-varying discounted Markov decision models $\mathcal \{M\}_n$ with unbounded costs $c_n$ and state-action dependent discount factors. Specifically we study controlled systems whose state process evolves according to the equation $x_\{n+1\}=G_n(x_n,a_n,\xi _n), n=0,1,\ldots $, with state-action dependent discount factors of the form $\alpha _n(x_n,a_n)$, where $a_n$ and $\xi _n$ are the control and the random disturbance at time $n$, respectively. Assuming that the sequences of functions $\lbrace \alpha _n\rbrace $,$\lbrace c_n\rbrace $ and $\lbrace G_n\rbrace $ converge, in certain sense, to $\alpha _\infty $, $c_\infty $ and $G_\infty $, our objective is to introduce a suitable control model for this class of systems and then, to show the existence of optimal policies for the limit system $\mathcal \{M\}_\infty $ corresponding to $\alpha _\infty $, $c_\infty $ and $G_\infty $. Finally, we illustrate our results and their applicability in a class of semi-Markov control models.},
author = {Escobedo-Trujillo, Beatris A., Higuera-Chan, Carmen G.},
journal = {Kybernetika},
keywords = {discounted optimality; non-constant discount factor; time-varying Markov decision processes},
language = {eng},
number = {1},
pages = {166-182},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs},
url = {http://eudml.org/doc/294596},
volume = {55},
year = {2019},
}

TY - JOUR
AU - Escobedo-Trujillo, Beatris A.
AU - Higuera-Chan, Carmen G.
TI - Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs
JO - Kybernetika
PY - 2019
PB - Institute of Information Theory and Automation AS CR
VL - 55
IS - 1
SP - 166
EP - 182
AB - In this paper we are concerned with a class of time-varying discounted Markov decision models $\mathcal {M}_n$ with unbounded costs $c_n$ and state-action dependent discount factors. Specifically we study controlled systems whose state process evolves according to the equation $x_{n+1}=G_n(x_n,a_n,\xi _n), n=0,1,\ldots $, with state-action dependent discount factors of the form $\alpha _n(x_n,a_n)$, where $a_n$ and $\xi _n$ are the control and the random disturbance at time $n$, respectively. Assuming that the sequences of functions $\lbrace \alpha _n\rbrace $,$\lbrace c_n\rbrace $ and $\lbrace G_n\rbrace $ converge, in certain sense, to $\alpha _\infty $, $c_\infty $ and $G_\infty $, our objective is to introduce a suitable control model for this class of systems and then, to show the existence of optimal policies for the limit system $\mathcal {M}_\infty $ corresponding to $\alpha _\infty $, $c_\infty $ and $G_\infty $. Finally, we illustrate our results and their applicability in a class of semi-Markov control models.
LA - eng
KW - discounted optimality; non-constant discount factor; time-varying Markov decision processes
UR - http://eudml.org/doc/294596
ER -

References

top
  1. Bastin, G., Dochain, D., On-line Estimation and Adaptive Control of Bioreactors., Elsevier, Amsterdam 2014. 
  2. Bertsekas, D. P., 10.1007/s11768-011-1005-3, J. Control Theory Appl. 9 (2011), 310-335. MR2833999DOI10.1007/s11768-011-1005-3
  3. Dynkin, E. B., Yushkevich, A. A., 10.1007/978-1-4615-6746-2, Springer-Verlag, New York 1979. MR0554083DOI10.1007/978-1-4615-6746-2
  4. González-Hernández, J., López-Martínez, R. R., Minjárez-Sosa, J. A., Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion., Kybernetika 45 (2009), 737-754. MR2599109
  5. Gordienko, E. I., Minjárez-Sosa, J. A., Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion., Kybernetika 34 (1998), 217-234. MR1621512
  6. Hernández-Lerma, O., Lasseerre, J. B., 10.1007/978-1-4612-0729-0, Springer, New York 1996. MR1363487DOI10.1007/978-1-4612-0729-0
  7. Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-1-4612-0561-6, Springer-Verlag, New York 1999. MR1697198DOI10.1007/978-1-4612-0561-6
  8. Hernández-Lerma, O., Hilgert, N., 10.1016/s0167-6911(99)00121-8, Syst. Control Lett. 40 (2000), 1, 37-42. MR1829073DOI10.1016/s0167-6911(99)00121-8
  9. Hilgert, N., Minjárez-Sosa, J. A., 10.1007/s001860100170, Math. Meth. Oper. Res. 54 (2001), 3, 491-505. MR1890916DOI10.1007/s001860100170
  10. Hilgert, N., Minjárez-Sosa, J. A., 10.1007/s00186-005-0024-6, Math. Meth. Oper. Res. 63 (2006), 443-460. MR2264761DOI10.1007/s00186-005-0024-6
  11. Hilgert, N., Senoussi, R., Vila, J. P., 10.1109/.2001.980647, C. R. Acad. Sci. Paris Série 1 1996), 232, 1085-1090. MR1423225DOI10.1109/.2001.980647
  12. Lewis, M. E., Paul, A., Uniform turnpike theorems for finite Markov decision processes., Math. Oper. Res. 
  13. Luque-Vásquez, F., Minjárez-Sosa, J. A., 10.1007/s001860400406, Math. Meth. Oper. Res. 61 (2005), 455-468. MR2225824DOI10.1007/s001860400406
  14. Luque-Vásquez, F., Minjárez-Sosa, J. A., Rosas-Rosas, L. C., 10.1007/s10440-011-9605-y, Acta Appl. Math. 114 (2011), 3, 135-156. MR2794078DOI10.1007/s10440-011-9605-y
  15. Luque-Vásquez, F., Minjárez-Sosa, J. A., Rosas-Rosas, L. C., 10.1007/s00245-009-9086-9, Appl. Math. Optim. Theory Appl. 61 (2010), 3, 317-336. MR2609593DOI10.1007/s00245-009-9086-9
  16. Minjárez-Sosa, J. A., 10.1007/s11750-015-0360-5, TOP 23 (2015), 743-772. MR3407674DOI10.1007/s11750-015-0360-5
  17. Minjárez-Sosa, J. A., Approximation and estimation in Markov control processes under discounted criterion., Kybernetika 40 (2004), 6, 681-690. MR2120390
  18. Powell, W. B., 10.1002/9780470182963, John Wiley and Sons Inc, 2007. MR2839330DOI10.1002/9780470182963
  19. Puterman, M. L., 10.1002/9780470316887, John Wiley and Sons 1994. MR1270015DOI10.1002/9780470316887
  20. Rieder, U., 10.1007/bf01168566, Manuscripta Math. 24 (1978), 115-131. Zbl0385.28005MR0493590DOI10.1007/bf01168566
  21. Robles-Alcaráz, M. T., Vega-Amaya, O., Minjárez-Sosa, J. A., 10.3233/rda-160116, Risk Decision Analysis 6 (2017), 2, 79-95. DOI10.3233/rda-160116
  22. Royden, H. L., Real Analysis., Prentice Hall 1968. Zbl1191.26002MR0928805
  23. Schäl, M., 10.1007/bf00532612, Z. Wahrs. Verw. Gerb. 32 (1975), 179-196. MR0378841DOI10.1007/bf00532612
  24. Shapiro, J. F., 10.1287/mnsc.14.5.292, Magnament Sci. 14 (1968), 292-300. DOI10.1287/mnsc.14.5.292

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.