Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs
Beatris A. Escobedo-Trujillo; Carmen G. Higuera-Chan
Kybernetika (2019)
- Volume: 55, Issue: 1, page 166-182
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topEscobedo-Trujillo, Beatris A., and Higuera-Chan, Carmen G.. "Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs." Kybernetika 55.1 (2019): 166-182. <http://eudml.org/doc/294596>.
@article{Escobedo2019,
abstract = {In this paper we are concerned with a class of time-varying discounted Markov decision models $\mathcal \{M\}_n$ with unbounded costs $c_n$ and state-action dependent discount factors. Specifically we study controlled systems whose state process evolves according to the equation $x_\{n+1\}=G_n(x_n,a_n,\xi _n), n=0,1,\ldots $, with state-action dependent discount factors of the form $\alpha _n(x_n,a_n)$, where $a_n$ and $\xi _n$ are the control and the random disturbance at time $n$, respectively. Assuming that the sequences of functions $\lbrace \alpha _n\rbrace $,$\lbrace c_n\rbrace $ and $\lbrace G_n\rbrace $ converge, in certain sense, to $\alpha _\infty $, $c_\infty $ and $G_\infty $, our objective is to introduce a suitable control model for this class of systems and then, to show the existence of optimal policies for the limit system $\mathcal \{M\}_\infty $ corresponding to $\alpha _\infty $, $c_\infty $ and $G_\infty $. Finally, we illustrate our results and their applicability in a class of semi-Markov control models.},
author = {Escobedo-Trujillo, Beatris A., Higuera-Chan, Carmen G.},
journal = {Kybernetika},
keywords = {discounted optimality; non-constant discount factor; time-varying Markov decision processes},
language = {eng},
number = {1},
pages = {166-182},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs},
url = {http://eudml.org/doc/294596},
volume = {55},
year = {2019},
}
TY - JOUR
AU - Escobedo-Trujillo, Beatris A.
AU - Higuera-Chan, Carmen G.
TI - Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs
JO - Kybernetika
PY - 2019
PB - Institute of Information Theory and Automation AS CR
VL - 55
IS - 1
SP - 166
EP - 182
AB - In this paper we are concerned with a class of time-varying discounted Markov decision models $\mathcal {M}_n$ with unbounded costs $c_n$ and state-action dependent discount factors. Specifically we study controlled systems whose state process evolves according to the equation $x_{n+1}=G_n(x_n,a_n,\xi _n), n=0,1,\ldots $, with state-action dependent discount factors of the form $\alpha _n(x_n,a_n)$, where $a_n$ and $\xi _n$ are the control and the random disturbance at time $n$, respectively. Assuming that the sequences of functions $\lbrace \alpha _n\rbrace $,$\lbrace c_n\rbrace $ and $\lbrace G_n\rbrace $ converge, in certain sense, to $\alpha _\infty $, $c_\infty $ and $G_\infty $, our objective is to introduce a suitable control model for this class of systems and then, to show the existence of optimal policies for the limit system $\mathcal {M}_\infty $ corresponding to $\alpha _\infty $, $c_\infty $ and $G_\infty $. Finally, we illustrate our results and their applicability in a class of semi-Markov control models.
LA - eng
KW - discounted optimality; non-constant discount factor; time-varying Markov decision processes
UR - http://eudml.org/doc/294596
ER -
References
top- Bastin, G., Dochain, D., On-line Estimation and Adaptive Control of Bioreactors., Elsevier, Amsterdam 2014.
- Bertsekas, D. P., 10.1007/s11768-011-1005-3, J. Control Theory Appl. 9 (2011), 310-335. MR2833999DOI10.1007/s11768-011-1005-3
- Dynkin, E. B., Yushkevich, A. A., 10.1007/978-1-4615-6746-2, Springer-Verlag, New York 1979. MR0554083DOI10.1007/978-1-4615-6746-2
- González-Hernández, J., López-Martínez, R. R., Minjárez-Sosa, J. A., Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion., Kybernetika 45 (2009), 737-754. MR2599109
- Gordienko, E. I., Minjárez-Sosa, J. A., Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion., Kybernetika 34 (1998), 217-234. MR1621512
- Hernández-Lerma, O., Lasseerre, J. B., 10.1007/978-1-4612-0729-0, Springer, New York 1996. MR1363487DOI10.1007/978-1-4612-0729-0
- Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-1-4612-0561-6, Springer-Verlag, New York 1999. MR1697198DOI10.1007/978-1-4612-0561-6
- Hernández-Lerma, O., Hilgert, N., 10.1016/s0167-6911(99)00121-8, Syst. Control Lett. 40 (2000), 1, 37-42. MR1829073DOI10.1016/s0167-6911(99)00121-8
- Hilgert, N., Minjárez-Sosa, J. A., 10.1007/s001860100170, Math. Meth. Oper. Res. 54 (2001), 3, 491-505. MR1890916DOI10.1007/s001860100170
- Hilgert, N., Minjárez-Sosa, J. A., 10.1007/s00186-005-0024-6, Math. Meth. Oper. Res. 63 (2006), 443-460. MR2264761DOI10.1007/s00186-005-0024-6
- Hilgert, N., Senoussi, R., Vila, J. P., 10.1109/.2001.980647, C. R. Acad. Sci. Paris Série 1 1996), 232, 1085-1090. MR1423225DOI10.1109/.2001.980647
- Lewis, M. E., Paul, A., Uniform turnpike theorems for finite Markov decision processes., Math. Oper. Res.
- Luque-Vásquez, F., Minjárez-Sosa, J. A., 10.1007/s001860400406, Math. Meth. Oper. Res. 61 (2005), 455-468. MR2225824DOI10.1007/s001860400406
- Luque-Vásquez, F., Minjárez-Sosa, J. A., Rosas-Rosas, L. C., 10.1007/s10440-011-9605-y, Acta Appl. Math. 114 (2011), 3, 135-156. MR2794078DOI10.1007/s10440-011-9605-y
- Luque-Vásquez, F., Minjárez-Sosa, J. A., Rosas-Rosas, L. C., 10.1007/s00245-009-9086-9, Appl. Math. Optim. Theory Appl. 61 (2010), 3, 317-336. MR2609593DOI10.1007/s00245-009-9086-9
- Minjárez-Sosa, J. A., 10.1007/s11750-015-0360-5, TOP 23 (2015), 743-772. MR3407674DOI10.1007/s11750-015-0360-5
- Minjárez-Sosa, J. A., Approximation and estimation in Markov control processes under discounted criterion., Kybernetika 40 (2004), 6, 681-690. MR2120390
- Powell, W. B., 10.1002/9780470182963, John Wiley and Sons Inc, 2007. MR2839330DOI10.1002/9780470182963
- Puterman, M. L., 10.1002/9780470316887, John Wiley and Sons 1994. MR1270015DOI10.1002/9780470316887
- Rieder, U., 10.1007/bf01168566, Manuscripta Math. 24 (1978), 115-131. Zbl0385.28005MR0493590DOI10.1007/bf01168566
- Robles-Alcaráz, M. T., Vega-Amaya, O., Minjárez-Sosa, J. A., 10.3233/rda-160116, Risk Decision Analysis 6 (2017), 2, 79-95. DOI10.3233/rda-160116
- Royden, H. L., Real Analysis., Prentice Hall 1968. Zbl1191.26002MR0928805
- Schäl, M., 10.1007/bf00532612, Z. Wahrs. Verw. Gerb. 32 (1975), 179-196. MR0378841DOI10.1007/bf00532612
- Shapiro, J. F., 10.1287/mnsc.14.5.292, Magnament Sci. 14 (1968), 292-300. DOI10.1287/mnsc.14.5.292
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.