Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion
Evgueni I. Gordienko; J. Adolfo Minjárez-Sosa
Kybernetika (1998)
- Volume: 34, Issue: 2, page [217]-234
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topGordienko, Evgueni I., and Minjárez-Sosa, J. Adolfo. "Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion." Kybernetika 34.2 (1998): [217]-234. <http://eudml.org/doc/33349>.
@article{Gordienko1998,
abstract = {We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations $x_\{t+1\}=F(x_t,a_t,\xi _t),\,\,t=0,1,\ldots $ with i.i.d. $\Re ^k$-valued random vectors $\xi _t$ whose density $\rho $ is unknown. Assuming observability of $\xi _t$ we propose the procedure of statistical estimation of $\rho $ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.},
author = {Gordienko, Evgueni I., Minjárez-Sosa, J. Adolfo},
journal = {Kybernetika},
keywords = {Markov control process; unbounded costs; discounted asymptotic optimality; density estimator; rate of convergence; Markov control process; unbounded costs; discounted asymptotic optimality; density estimator; rate of convergence},
language = {eng},
number = {2},
pages = {[217]-234},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion},
url = {http://eudml.org/doc/33349},
volume = {34},
year = {1998},
}
TY - JOUR
AU - Gordienko, Evgueni I.
AU - Minjárez-Sosa, J. Adolfo
TI - Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion
JO - Kybernetika
PY - 1998
PB - Institute of Information Theory and Automation AS CR
VL - 34
IS - 2
SP - [217]
EP - 234
AB - We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations $x_{t+1}=F(x_t,a_t,\xi _t),\,\,t=0,1,\ldots $ with i.i.d. $\Re ^k$-valued random vectors $\xi _t$ whose density $\rho $ is unknown. Assuming observability of $\xi _t$ we propose the procedure of statistical estimation of $\rho $ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.
LA - eng
KW - Markov control process; unbounded costs; discounted asymptotic optimality; density estimator; rate of convergence; Markov control process; unbounded costs; discounted asymptotic optimality; density estimator; rate of convergence
UR - http://eudml.org/doc/33349
ER -
References
top- Agrawal R., 10.2307/3214681, J. Appl. Probab. 28 (1991), 779–790 (1991) Zbl0741.60070MR1133786DOI10.2307/3214681
- Ash R. B., Real Analysis and Probability, Academic Press, New York 1972 MR0435320
- Cavazos–Cadena R., 10.1007/BF01102341, J. Optim. Theory Appl. 65 (1990), 191–207 (1990) MR1051545DOI10.1007/BF01102341
- Dynkin E. B., A A., Yushkevich: Controlled Markov Processes, Springer–Verlag, New York 1979 MR0554083
- Fernández–Gaucherand E., Arapostathis A., Marcus S. I., A methodology for the adaptive control of Markov chains under partial state information, In: Proc. of the 1992 Conf. on Information Sci. and Systems, Princeton, New Jersey, pp. 773–775 (1992)
- Fernández–Gaucherand E., Arapostathis A., Marcus S. I., 10.1109/9.222316, IEEE Trans. Automat. Control 38 (1993), 987–993 (1993) Zbl0786.93089MR1227213DOI10.1109/9.222316
- Gordienko E. I., Adaptive strategies for certain classes of controlled Markov processes, Theory Probab. Appl. 29 (1985), 504–518 (1985) Zbl0577.93067
- Gordienko E. I., Controlled Markov sequences with slowly varying characteristics II, Adaptive optimal strategies. Soviet J. Comput. Systems Sci. 23 (1985), 87–93 (1985) Zbl0618.93070MR0844298
- Gordienko E. I., Hernández–Lerma O., Average cost Markov control processes with weighted norms: value iteration, Appl. Math. 23 (1995), 219–237 (1995) Zbl0829.93068MR1341224
- Gordienko E. I., Montes–de–Oca R., Minjárez–Sosa J. A., 10.1007/BF01193864, Math. Methods Oper. Res. 45 (1997), 2, to appear (1997) Zbl0882.90127MR1446409DOI10.1007/BF01193864
- Hasminskii R., Ibragimov I., 10.1214/aos/1176347736, Ann. of Statist. 18 (1990), 999–1010 (1990) Zbl0705.62039MR1062695DOI10.1214/aos/1176347736
- Hernández–Lerma O., Adaptive Markov Control Processes, Springer–Verlag, New York 1989 Zbl0698.90053MR0995463
- Hernández–Lerma O., Infinite–horizon Markov control processes with undiscounted cost criteria: from average to overtaking optimality, Reporte Interno 165. Departamento de Matemáticas, CINVESTAV-IPN, A.P. 14-740.07000, México, D. F., México (1994). (Submitted for publication) (1994)
- Hernández–Lerma O., Cavazos–Cadena R., 10.1007/BF00049572, Acta Appl. Math. 20 (1990), 285–307 (1990) Zbl0717.93066MR1081591DOI10.1007/BF00049572
- Hernández–Lerma O., Lasserre J. B., Discrete–Time Markov Control Processes, Springer–Verlag, New York 1995 Zbl0928.93002
- Hernández–Lerma O., Marcus S. I., 10.1007/BF00938426, J. Optim. Theory Appl. 46 (1985), 227–235 (1985) Zbl0543.90093MR0794250DOI10.1007/BF00938426
- Hernández–Lerma O., Marcus S. I., 10.1016/0167-6911(87)90055-7, Systems Control Lett. 9 (1987), 307–315 (1987) MR0912683DOI10.1016/0167-6911(87)90055-7
- Hinderer K., Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, (Lecture Notes in Operations Research and Mathematical Systems 33.) Springer–Verlag, Berlin – Heidelberg – New York 1970 Zbl0202.18401MR0267890
- Köthe G., Topological Vector Spaces I, Springer–Verlag, New York 1969 MR0248498
- Kumar P. R., Varaiya P., Stochastic Systems: Estimation, Identification and Adaptive Control, Prentice–Hall, Englewood Cliffs 1986 Zbl0706.93057
- Lippman S. A., 10.1287/mnsc.21.11.1225, Management Sci. 21 (1975), 1225–1233 (1975) Zbl0309.90017MR0398535DOI10.1287/mnsc.21.11.1225
- Mandl P., 10.2307/1426206, Adv. in Appl. Probab. 6 (1974), 40–60 (1974) Zbl0281.60070MR0339876DOI10.2307/1426206
- Rieder U., 10.1007/BF01168566, Manuscripta Math. 24 (1978), 115–131 (1978) Zbl0385.28005MR0493590DOI10.1007/BF01168566
- Schäl M., 10.1080/17442508708833435, Stochastics 20 (1987), 51–71 (1987) MR0875814DOI10.1080/17442508708833435
- Stettner L., 10.1007/BF01195980, J. Appl. Math. Optim. 27 (1993), 161–177 (1993) Zbl0769.93084MR1202530DOI10.1007/BF01195980
- Stettner L., Ergodic control of Markov process with mixed observation structure, Dissertationes Math. 341 (1995), 1–36 (1995) MR1318335
- Nunen J. A. E. E. van, Wessels J., 10.1287/mnsc.24.5.576, Management Sci. 24 (1978), 576–580 (1978) DOI10.1287/mnsc.24.5.576
Citations in EuDML Documents
top- J. Adolfo Minjárez-Sosa, Approximation and estimation in Markov control processes under a discounted criterion
- J. Minjárez-Sosa, Nonparametric adaptive control for discrete-time Markov processes with unbounded costs under average criterion
- Yofre H. García, Saul Diaz-Infante, J. Adolfo Minjárez-Sosa, Partially observable queueing systems with controlled service rates under a discounted optimality criterion
- Beatris A. Escobedo-Trujillo, Carmen G. Higuera-Chan, Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs
- E. Everardo Martinez-Garcia, J. Adolfo Minjárez-Sosa, Oscar Vega-Amaya, Partially observable Markov decision processes with partially observable random discount factors
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.