Approximation and adaptive control of Markov processes: Average reward criterion
Kybernetika (1987)
- Volume: 23, Issue: 4, page 265-288
- ISSN: 0023-5954
Access Full Article
topHow to cite
topHernández-Lerma, Onésimo. "Approximation and adaptive control of Markov processes: Average reward criterion." Kybernetika 23.4 (1987): 265-288. <http://eudml.org/doc/28802>.
@article{Hernández1987,
author = {Hernández-Lerma, Onésimo},
journal = {Kybernetika},
keywords = {average-reward controlled Markov processes; Borel state and control spaces; optimal adaptive policies; unknown parameters; approximation procedures; value-iteration},
language = {eng},
number = {4},
pages = {265-288},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Approximation and adaptive control of Markov processes: Average reward criterion},
url = {http://eudml.org/doc/28802},
volume = {23},
year = {1987},
}
TY - JOUR
AU - Hernández-Lerma, Onésimo
TI - Approximation and adaptive control of Markov processes: Average reward criterion
JO - Kybernetika
PY - 1987
PB - Institute of Information Theory and Automation AS CR
VL - 23
IS - 4
SP - 265
EP - 288
LA - eng
KW - average-reward controlled Markov processes; Borel state and control spaces; optimal adaptive policies; unknown parameters; approximation procedures; value-iteration
UR - http://eudml.org/doc/28802
ER -
References
top- R. S. Acosta Abreu, Control of Markov chains with unknown parameters and metric state space, Submitted for publication. In Spanish.
- R. S. Acosta Abreu, O. Hernandez-Lerma, Iterative adaptive control of denumerable state average-cost Markov systems, Control. Cyber. 14 (1985), 313 - 322. (1985) MR0842780
- V. V. Baranov, Recursive algorithms of adaptive control in stochastic systems, Cybernetics 17 (1981), 815-824. (1981) MR0689427
- V. V. Baranov, A recursive algorithm in markovian decision processes, Cybernetics 18 (1982), 499-506. (1982) Zbl0517.90089MR0712079
- D. P. Bertsekas, S. E. Shreve, Stochastic Optimal Control- The Discrete Time Case, Academic Press, New York 1978. (1978) Zbl0471.93002MR0511544
- A. Federgruen, P. J. Schweitzer, Nonstationary Markov decision problems with converging parameters, J. Optim. Theory Appl. 34 (1981), 207-241. (1981) Zbl0426.90091MR0625228
- A. Federgruen, H. C. Tijms, The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms, J. Appl. Probab. 15 (1978), 356-373. (1978) Zbl0386.90060MR0475896
- P. J. Georgin, Contröle de chaines de Markov sur des espaces arbitraires, Ann. Inst. H. Poincare B 14 (1978), 255-277. (1978) MR0508929
- J. P. Georgin, Estimation et controle de chaines de Markov sur des espaces arbitraires, In: Lecture Notes Mathematics 636. Springer-Verlag, Berlin-Heidelberg-New York-Tokyo 1978, pp. 71-113. (1978) MR0498945
- E. I. Gordienko, Adaptive strategies for certain classes of controlled Markov processes, Theory Probab. Appl. 29 (1985), 504-518. (1985) Zbl0577.93067
- L. G. Gubenko, E. S. Statland, On controlled, discrete-time Markov decision processes, Theory Probab. Math. Statist. 7 (1975), 47-61. (1975)
- O. Hernández-Lerma, Approximation and adaptive policies in discounted dynamic programming, Bol. Soc. Mat. Mexicana 30 (1985). In press. (1985) MR0886123
- O. Hernández-Lerma, Nonstationary value-iteration and adaptive control of discounted semi-Markov processes, J. Math. Anal. Appl. 112 (1985), 435-445. (1985) MR0813610
- O. Hernandez-Lerma, S. I. Marcus, Adaptive control of service in queueing systems, Syst. Control Lett. 3 (1983), 283-289. (1983) Zbl0534.90037MR0722958
- O. Hernández-Lerma, S. I. Marcus, Optimal adaptive control of priority assignment in queueing systems, Syst. Control Lett. 4 (1984), 65 - 75. (1984) MR0740208
- O. Hernández-Lerma, S. I. Marcus, Adaptive policies for discrete-time stochastic control systems with unknown disturbance distribution, Submitted for publication, 1986. (1986) MR0912683
- O. Hernández-Lerma, S. I. Marcus, Nonparametric adaptive control of discrete-time partially observable stochastic systems, Submitted for publication, 1986. (1986)
- C. J. Himmelberg T. Parthasarathy, F. S. Van Vleck, Optimal plans for dynamic programming problems, Math. Oper. Res. 1 (1976), 390-394. (1976) MR0444043
- K. Hinderer, Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter, (Lecture Notes in Operations Research and Mathematical Systems 33.) Springer-Verlag, Berlin-Heidelberg-New York 1970. (1970) Zbl0202.18401MR0267890
- A. Hordijk P. J. Schweitzer, H. Tijms, The asymptotic behaviour of the minimal total expected cost for the denumerable state Markov decision model, J. Appl. Probab. 12 (1975), 298-305. (1975) MR0378838
- P. R. Kumar, A survey of some results in stochastic adaptive control, SIAM J. Control Optim. 23 (1985), 329-380. (1985) Zbl0571.93038MR0784574
- M. Kurano, Discrete-time markovian decision processes with an unknown parameter - average return criterion, J. Oper. Res. Soc. Japan 15 (1972), 67-76. (1972) Zbl0238.90006MR0343942
- M. Kurano, Average-optimal adaptive policies in semi-Markov decision processes including an unknown parameter, J. Oper. Res. Soc. Japan 28 (1985), 252-366. (1985) Zbl0579.90098MR0812416
- P. Mandl, Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974), 40-60. (1974) Zbl0281.60070MR0339876
- P. Mandl, On the adaptive control of countable Markov chains, In: Probability Theory, Banach Centre Publications 5, PWB-Polish Scientific Publishers, Warsaw 1979, pp. 159- 173. (1979) Zbl0439.60069MR0561478
- H. L. Royden, Real Analysis, Macmillan, New York 1968. (1968) MR0151555
- M. Schäl, Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal, Z. Wahrsch. verw. Gebiete 32 (1975), 179-196. (1975) MR0378841
- M. Schäl, Estimation and control in discounted stochastic dynamic programming, Preprint No. 428, Institute for Applied Math., University of Bonn, Bonn 1981. (1981) MR0875814
- H. C. Tijms, On dynamic programming with arbitrary state space, compact action space and the average reward as criterion, Report BW 55/75, Mathematisch Centrum, Amsterdam 1975. (1975)
- T. Ueno, Some limit theorems for temporally discrete Markov processes, J. Fac. Science, University of Tokyo 7 (1957), 449-462. (1957) Zbl0077.33201MR0090921
- D. J. White, Dynamic programming, Markov chains, and the method of successive approximations, J. Math. Anal. Appl. 6 (1963), 373-376. (1963) MR0148480
- P. Mandl, G. Hiibner, Transient phenomena and self-optimizing control of Markov chains, Acta Universitatis Carolinae - Math, et Phys. 26 (1985), 1, 35-51. (1985) MR0830264
- A. Hordijk, H. Tijms, A modified form of the iterative method of dynamic programming, Ann. Statist. 3 (1975), 1, 203-208. (1975) Zbl0304.90115MR0378837
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.