Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion

Evgueni I. Gordienko; J. Adolfo Minjárez-Sosa

Kybernetika (1998)

  • Volume: 34, Issue: 2, page [217]-234
  • ISSN: 0023-5954

Abstract

top
We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations x t + 1 = F ( x t , a t , ξ t ) , t = 0 , 1 , ... with i.i.d. k -valued random vectors ξ t whose density ρ is unknown. Assuming observability of ξ t we propose the procedure of statistical estimation of ρ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.

How to cite

top

Gordienko, Evgueni I., and Minjárez-Sosa, J. Adolfo. "Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion." Kybernetika 34.2 (1998): [217]-234. <http://eudml.org/doc/33349>.

@article{Gordienko1998,
abstract = {We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations $x_\{t+1\}=F(x_t,a_t,\xi _t),\,\,t=0,1,\ldots $ with i.i.d. $\Re ^k$-valued random vectors $\xi _t$ whose density $\rho $ is unknown. Assuming observability of $\xi _t$ we propose the procedure of statistical estimation of $\rho $ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.},
author = {Gordienko, Evgueni I., Minjárez-Sosa, J. Adolfo},
journal = {Kybernetika},
keywords = {Markov control process; unbounded costs; discounted asymptotic optimality; density estimator; rate of convergence; Markov control process; unbounded costs; discounted asymptotic optimality; density estimator; rate of convergence},
language = {eng},
number = {2},
pages = {[217]-234},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion},
url = {http://eudml.org/doc/33349},
volume = {34},
year = {1998},
}

TY - JOUR
AU - Gordienko, Evgueni I.
AU - Minjárez-Sosa, J. Adolfo
TI - Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion
JO - Kybernetika
PY - 1998
PB - Institute of Information Theory and Automation AS CR
VL - 34
IS - 2
SP - [217]
EP - 234
AB - We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations $x_{t+1}=F(x_t,a_t,\xi _t),\,\,t=0,1,\ldots $ with i.i.d. $\Re ^k$-valued random vectors $\xi _t$ whose density $\rho $ is unknown. Assuming observability of $\xi _t$ we propose the procedure of statistical estimation of $\rho $ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.
LA - eng
KW - Markov control process; unbounded costs; discounted asymptotic optimality; density estimator; rate of convergence; Markov control process; unbounded costs; discounted asymptotic optimality; density estimator; rate of convergence
UR - http://eudml.org/doc/33349
ER -

References

top
  1. Agrawal R., 10.2307/3214681, J. Appl. Probab. 28 (1991), 779–790 (1991) Zbl0741.60070MR1133786DOI10.2307/3214681
  2. Ash R. B., Real Analysis and Probability, Academic Press, New York 1972 MR0435320
  3. Cavazos–Cadena R., 10.1007/BF01102341, J. Optim. Theory Appl. 65 (1990), 191–207 (1990) MR1051545DOI10.1007/BF01102341
  4. Dynkin E. B., A A., Yushkevich: Controlled Markov Processes, Springer–Verlag, New York 1979 MR0554083
  5. Fernández–Gaucherand E., Arapostathis A., Marcus S. I., A methodology for the adaptive control of Markov chains under partial state information, In: Proc. of the 1992 Conf. on Information Sci. and Systems, Princeton, New Jersey, pp. 773–775 (1992) 
  6. Fernández–Gaucherand E., Arapostathis A., Marcus S. I., 10.1109/9.222316, IEEE Trans. Automat. Control 38 (1993), 987–993 (1993) Zbl0786.93089MR1227213DOI10.1109/9.222316
  7. Gordienko E. I., Adaptive strategies for certain classes of controlled Markov processes, Theory Probab. Appl. 29 (1985), 504–518 (1985) Zbl0577.93067
  8. Gordienko E. I., Controlled Markov sequences with slowly varying characteristics II, Adaptive optimal strategies. Soviet J. Comput. Systems Sci. 23 (1985), 87–93 (1985) Zbl0618.93070MR0844298
  9. Gordienko E. I., Hernández–Lerma O., Average cost Markov control processes with weighted norms: value iteration, Appl. Math. 23 (1995), 219–237 (1995) Zbl0829.93068MR1341224
  10. Gordienko E. I., Montes–de–Oca R., Minjárez–Sosa J. A., 10.1007/BF01193864, Math. Methods Oper. Res. 45 (1997), 2, to appear (1997) Zbl0882.90127MR1446409DOI10.1007/BF01193864
  11. Hasminskii R., Ibragimov I., 10.1214/aos/1176347736, Ann. of Statist. 18 (1990), 999–1010 (1990) Zbl0705.62039MR1062695DOI10.1214/aos/1176347736
  12. Hernández–Lerma O., Adaptive Markov Control Processes, Springer–Verlag, New York 1989 Zbl0698.90053MR0995463
  13. Hernández–Lerma O., Infinite–horizon Markov control processes with undiscounted cost criteria: from average to overtaking optimality, Reporte Interno 165. Departamento de Matemáticas, CINVESTAV-IPN, A.P. 14-740.07000, México, D. F., México (1994). (Submitted for publication) (1994) 
  14. Hernández–Lerma O., Cavazos–Cadena R., 10.1007/BF00049572, Acta Appl. Math. 20 (1990), 285–307 (1990) Zbl0717.93066MR1081591DOI10.1007/BF00049572
  15. Hernández–Lerma O., Lasserre J. B., Discrete–Time Markov Control Processes, Springer–Verlag, New York 1995 Zbl0928.93002
  16. Hernández–Lerma O., Marcus S. I., 10.1007/BF00938426, J. Optim. Theory Appl. 46 (1985), 227–235 (1985) Zbl0543.90093MR0794250DOI10.1007/BF00938426
  17. Hernández–Lerma O., Marcus S. I., 10.1016/0167-6911(87)90055-7, Systems Control Lett. 9 (1987), 307–315 (1987) MR0912683DOI10.1016/0167-6911(87)90055-7
  18. Hinderer K., Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, (Lecture Notes in Operations Research and Mathematical Systems 33.) Springer–Verlag, Berlin – Heidelberg – New York 1970 Zbl0202.18401MR0267890
  19. Köthe G., Topological Vector Spaces I, Springer–Verlag, New York 1969 MR0248498
  20. Kumar P. R., Varaiya P., Stochastic Systems: Estimation, Identification and Adaptive Control, Prentice–Hall, Englewood Cliffs 1986 Zbl0706.93057
  21. Lippman S. A., 10.1287/mnsc.21.11.1225, Management Sci. 21 (1975), 1225–1233 (1975) Zbl0309.90017MR0398535DOI10.1287/mnsc.21.11.1225
  22. Mandl P., 10.2307/1426206, Adv. in Appl. Probab. 6 (1974), 40–60 (1974) Zbl0281.60070MR0339876DOI10.2307/1426206
  23. Rieder U., 10.1007/BF01168566, Manuscripta Math. 24 (1978), 115–131 (1978) Zbl0385.28005MR0493590DOI10.1007/BF01168566
  24. Schäl M., 10.1080/17442508708833435, Stochastics 20 (1987), 51–71 (1987) MR0875814DOI10.1080/17442508708833435
  25. Stettner L., 10.1007/BF01195980, J. Appl. Math. Optim. 27 (1993), 161–177 (1993) Zbl0769.93084MR1202530DOI10.1007/BF01195980
  26. Stettner L., Ergodic control of Markov process with mixed observation structure, Dissertationes Math. 341 (1995), 1–36 (1995) MR1318335
  27. Nunen J. A. E. E. van, Wessels J., 10.1287/mnsc.24.5.576, Management Sci. 24 (1978), 576–580 (1978) DOI10.1287/mnsc.24.5.576

Citations in EuDML Documents

top
  1. J. Adolfo Minjárez-Sosa, Approximation and estimation in Markov control processes under a discounted criterion
  2. J. Minjárez-Sosa, Nonparametric adaptive control for discrete-time Markov processes with unbounded costs under average criterion
  3. Yofre H. García, Saul Diaz-Infante, J. Adolfo Minjárez-Sosa, Partially observable queueing systems with controlled service rates under a discounted optimality criterion
  4. Beatris A. Escobedo-Trujillo, Carmen G. Higuera-Chan, Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs
  5. E. Everardo Martinez-Garcia, J. Adolfo Minjárez-Sosa, Oscar Vega-Amaya, Partially observable Markov decision processes with partially observable random discount factors

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.