Approximation and adaptive control of Markov processes: Average reward criterion

Onésimo Hernández-Lerma

Kybernetika (1987)

  • Volume: 23, Issue: 4, page 265-288
  • ISSN: 0023-5954

How to cite

top

Hernández-Lerma, Onésimo. "Approximation and adaptive control of Markov processes: Average reward criterion." Kybernetika 23.4 (1987): 265-288. <http://eudml.org/doc/28802>.

@article{Hernández1987,
author = {Hernández-Lerma, Onésimo},
journal = {Kybernetika},
keywords = {average-reward controlled Markov processes; Borel state and control spaces; optimal adaptive policies; unknown parameters; approximation procedures; value-iteration},
language = {eng},
number = {4},
pages = {265-288},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Approximation and adaptive control of Markov processes: Average reward criterion},
url = {http://eudml.org/doc/28802},
volume = {23},
year = {1987},
}

TY - JOUR
AU - Hernández-Lerma, Onésimo
TI - Approximation and adaptive control of Markov processes: Average reward criterion
JO - Kybernetika
PY - 1987
PB - Institute of Information Theory and Automation AS CR
VL - 23
IS - 4
SP - 265
EP - 288
LA - eng
KW - average-reward controlled Markov processes; Borel state and control spaces; optimal adaptive policies; unknown parameters; approximation procedures; value-iteration
UR - http://eudml.org/doc/28802
ER -

References

top
  1. R. S. Acosta Abreu, Control of Markov chains with unknown parameters and metric state space, Submitted for publication. In Spanish. 
  2. R. S. Acosta Abreu, O. Hernandez-Lerma, Iterative adaptive control of denumerable state average-cost Markov systems, Control. Cyber. 14 (1985), 313 - 322. (1985) MR0842780
  3. V. V. Baranov, Recursive algorithms of adaptive control in stochastic systems, Cybernetics 17 (1981), 815-824. (1981) MR0689427
  4. V. V. Baranov, A recursive algorithm in markovian decision processes, Cybernetics 18 (1982), 499-506. (1982) Zbl0517.90089MR0712079
  5. D. P. Bertsekas, S. E. Shreve, Stochastic Optimal Control- The Discrete Time Case, Academic Press, New York 1978. (1978) Zbl0471.93002MR0511544
  6. A. Federgruen, P. J. Schweitzer, Nonstationary Markov decision problems with converging parameters, J. Optim. Theory Appl. 34 (1981), 207-241. (1981) Zbl0426.90091MR0625228
  7. A. Federgruen, H. C. Tijms, The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms, J. Appl. Probab. 15 (1978), 356-373. (1978) Zbl0386.90060MR0475896
  8. P. J. Georgin, Contröle de chaines de Markov sur des espaces arbitraires, Ann. Inst. H. Poincare B 14 (1978), 255-277. (1978) MR0508929
  9. J. P. Georgin, Estimation et controle de chaines de Markov sur des espaces arbitraires, In: Lecture Notes Mathematics 636. Springer-Verlag, Berlin-Heidelberg-New York-Tokyo 1978, pp. 71-113. (1978) MR0498945
  10. E. I. Gordienko, Adaptive strategies for certain classes of controlled Markov processes, Theory Probab. Appl. 29 (1985), 504-518. (1985) Zbl0577.93067
  11. L. G. Gubenko, E. S. Statland, On controlled, discrete-time Markov decision processes, Theory Probab. Math. Statist. 7 (1975), 47-61. (1975) 
  12. O. Hernández-Lerma, Approximation and adaptive policies in discounted dynamic programming, Bol. Soc. Mat. Mexicana 30 (1985). In press. (1985) MR0886123
  13. O. Hernández-Lerma, Nonstationary value-iteration and adaptive control of discounted semi-Markov processes, J. Math. Anal. Appl. 112 (1985), 435-445. (1985) MR0813610
  14. O. Hernandez-Lerma, S. I. Marcus, Adaptive control of service in queueing systems, Syst. Control Lett. 3 (1983), 283-289. (1983) Zbl0534.90037MR0722958
  15. O. Hernández-Lerma, S. I. Marcus, Optimal adaptive control of priority assignment in queueing systems, Syst. Control Lett. 4 (1984), 65 - 75. (1984) MR0740208
  16. O. Hernández-Lerma, S. I. Marcus, Adaptive policies for discrete-time stochastic control systems with unknown disturbance distribution, Submitted for publication, 1986. (1986) MR0912683
  17. O. Hernández-Lerma, S. I. Marcus, Nonparametric adaptive control of discrete-time partially observable stochastic systems, Submitted for publication, 1986. (1986) 
  18. C. J. Himmelberg T. Parthasarathy, F. S. Van Vleck, Optimal plans for dynamic programming problems, Math. Oper. Res. 1 (1976), 390-394. (1976) MR0444043
  19. K. Hinderer, Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter, (Lecture Notes in Operations Research and Mathematical Systems 33.) Springer-Verlag, Berlin-Heidelberg-New York 1970. (1970) Zbl0202.18401MR0267890
  20. A. Hordijk P. J. Schweitzer, H. Tijms, The asymptotic behaviour of the minimal total expected cost for the denumerable state Markov decision model, J. Appl. Probab. 12 (1975), 298-305. (1975) MR0378838
  21. P. R. Kumar, A survey of some results in stochastic adaptive control, SIAM J. Control Optim. 23 (1985), 329-380. (1985) Zbl0571.93038MR0784574
  22. M. Kurano, Discrete-time markovian decision processes with an unknown parameter - average return criterion, J. Oper. Res. Soc. Japan 15 (1972), 67-76. (1972) Zbl0238.90006MR0343942
  23. M. Kurano, Average-optimal adaptive policies in semi-Markov decision processes including an unknown parameter, J. Oper. Res. Soc. Japan 28 (1985), 252-366. (1985) Zbl0579.90098MR0812416
  24. P. Mandl, Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974), 40-60. (1974) Zbl0281.60070MR0339876
  25. P. Mandl, On the adaptive control of countable Markov chains, In: Probability Theory, Banach Centre Publications 5, PWB-Polish Scientific Publishers, Warsaw 1979, pp. 159- 173. (1979) Zbl0439.60069MR0561478
  26. H. L. Royden, Real Analysis, Macmillan, New York 1968. (1968) MR0151555
  27. M. Schäl, Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal, Z. Wahrsch. verw. Gebiete 32 (1975), 179-196. (1975) MR0378841
  28. M. Schäl, Estimation and control in discounted stochastic dynamic programming, Preprint No. 428, Institute for Applied Math., University of Bonn, Bonn 1981. (1981) MR0875814
  29. H. C. Tijms, On dynamic programming with arbitrary state space, compact action space and the average reward as criterion, Report BW 55/75, Mathematisch Centrum, Amsterdam 1975. (1975) 
  30. T. Ueno, Some limit theorems for temporally discrete Markov processes, J. Fac. Science, University of Tokyo 7 (1957), 449-462. (1957) Zbl0077.33201MR0090921
  31. D. J. White, Dynamic programming, Markov chains, and the method of successive approximations, J. Math. Anal. Appl. 6 (1963), 373-376. (1963) MR0148480
  32. P. Mandl, G. Hiibner, Transient phenomena and self-optimizing control of Markov chains, Acta Universitatis Carolinae - Math, et Phys. 26 (1985), 1, 35-51. (1985) MR0830264
  33. A. Hordijk, H. Tijms, A modified form of the iterative method of dynamic programming, Ann. Statist. 3 (1975), 1, 203-208. (1975) Zbl0304.90115MR0378837

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.