Approximation and adaptive control of Markov processes: Average reward criterion

Onésimo Hernández-Lerma

Approximation and adaptive control of Markov processes: Average reward criterion

Onésimo Hernández-Lerma

Kybernetika (1987)

Volume: 23, Issue: 4, page 265-288
ISSN: 0023-5954

Access Full Article

top

Access to full text

Full (PDF)

How to cite

top

MLA
BibTeX
RIS

Hernández-Lerma, Onésimo. "Approximation and adaptive control of Markov processes: Average reward criterion." Kybernetika 23.4 (1987): 265-288. <http://eudml.org/doc/28802>.

@article{Hernández1987,
author = {Hernández-Lerma, Onésimo},
journal = {Kybernetika},
keywords = {average-reward controlled Markov processes; Borel state and control spaces; optimal adaptive policies; unknown parameters; approximation procedures; value-iteration},
language = {eng},
number = {4},
pages = {265-288},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Approximation and adaptive control of Markov processes: Average reward criterion},
url = {http://eudml.org/doc/28802},
volume = {23},
year = {1987},
}

TY - JOUR
AU - Hernández-Lerma, Onésimo
TI - Approximation and adaptive control of Markov processes: Average reward criterion
JO - Kybernetika
PY - 1987
PB - Institute of Information Theory and Automation AS CR
VL - 23
IS - 4
SP - 265
EP - 288
LA - eng
KW - average-reward controlled Markov processes; Borel state and control spaces; optimal adaptive policies; unknown parameters; approximation procedures; value-iteration
UR - http://eudml.org/doc/28802
ER -

References

top

R. S. Acosta Abreu, Control of Markov chains with unknown parameters and metric state space, Submitted for publication. In Spanish.
R. S. Acosta Abreu, O. Hernandez-Lerma, Iterative adaptive control of denumerable state average-cost Markov systems, Control. Cyber. 14 (1985), 313 - 322. (1985) MR0842780
V. V. Baranov, Recursive algorithms of adaptive control in stochastic systems, Cybernetics 17 (1981), 815-824. (1981) MR0689427
V. V. Baranov, A recursive algorithm in markovian decision processes, Cybernetics 18 (1982), 499-506. (1982) Zbl0517.90089 MR0712079
D. P. Bertsekas, S. E. Shreve, Stochastic Optimal Control- The Discrete Time Case, Academic Press, New York 1978. (1978) Zbl0471.93002 MR0511544
A. Federgruen, P. J. Schweitzer, Nonstationary Markov decision problems with converging parameters, J. Optim. Theory Appl. 34 (1981), 207-241. (1981) Zbl0426.90091 MR0625228
A. Federgruen, H. C. Tijms, The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms, J. Appl. Probab. 15 (1978), 356-373. (1978) Zbl0386.90060 MR0475896
P. J. Georgin, Contröle de chaines de Markov sur des espaces arbitraires, Ann. Inst. H. Poincare B 14 (1978), 255-277. (1978) MR0508929
J. P. Georgin, Estimation et controle de chaines de Markov sur des espaces arbitraires, In: Lecture Notes Mathematics 636. Springer-Verlag, Berlin-Heidelberg-New York-Tokyo 1978, pp. 71-113. (1978) MR0498945
E. I. Gordienko, Adaptive strategies for certain classes of controlled Markov processes, Theory Probab. Appl. 29 (1985), 504-518. (1985) Zbl0577.93067
L. G. Gubenko, E. S. Statland, On controlled, discrete-time Markov decision processes, Theory Probab. Math. Statist. 7 (1975), 47-61. (1975)
O. Hernández-Lerma, Approximation and adaptive policies in discounted dynamic programming, Bol. Soc. Mat. Mexicana 30 (1985). In press. (1985) MR0886123
O. Hernández-Lerma, Nonstationary value-iteration and adaptive control of discounted semi-Markov processes, J. Math. Anal. Appl. 112 (1985), 435-445. (1985) MR0813610
O. Hernandez-Lerma, S. I. Marcus, Adaptive control of service in queueing systems, Syst. Control Lett. 3 (1983), 283-289. (1983) Zbl0534.90037 MR0722958
O. Hernández-Lerma, S. I. Marcus, Optimal adaptive control of priority assignment in queueing systems, Syst. Control Lett. 4 (1984), 65 - 75. (1984) MR0740208
O. Hernández-Lerma, S. I. Marcus, Adaptive policies for discrete-time stochastic control systems with unknown disturbance distribution, Submitted for publication, 1986. (1986) MR0912683
O. Hernández-Lerma, S. I. Marcus, Nonparametric adaptive control of discrete-time partially observable stochastic systems, Submitted for publication, 1986. (1986)
C. J. Himmelberg T. Parthasarathy, F. S. Van Vleck, Optimal plans for dynamic programming problems, Math. Oper. Res. 1 (1976), 390-394. (1976) MR0444043
K. Hinderer, Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter, (Lecture Notes in Operations Research and Mathematical Systems 33.) Springer-Verlag, Berlin-Heidelberg-New York 1970. (1970) Zbl0202.18401 MR0267890
A. Hordijk P. J. Schweitzer, H. Tijms, The asymptotic behaviour of the minimal total expected cost for the denumerable state Markov decision model, J. Appl. Probab. 12 (1975), 298-305. (1975) MR0378838
P. R. Kumar, A survey of some results in stochastic adaptive control, SIAM J. Control Optim. 23 (1985), 329-380. (1985) Zbl0571.93038 MR0784574
M. Kurano, Discrete-time markovian decision processes with an unknown parameter - average return criterion, J. Oper. Res. Soc. Japan 15 (1972), 67-76. (1972) Zbl0238.90006 MR0343942
M. Kurano, Average-optimal adaptive policies in semi-Markov decision processes including an unknown parameter, J. Oper. Res. Soc. Japan 28 (1985), 252-366. (1985) Zbl0579.90098 MR0812416
P. Mandl, Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974), 40-60. (1974) Zbl0281.60070 MR0339876
P. Mandl, On the adaptive control of countable Markov chains, In: Probability Theory, Banach Centre Publications 5, PWB-Polish Scientific Publishers, Warsaw 1979, pp. 159- 173. (1979) Zbl0439.60069 MR0561478
H. L. Royden, Real Analysis, Macmillan, New York 1968. (1968) MR0151555
M. Schäl, Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal, Z. Wahrsch. verw. Gebiete 32 (1975), 179-196. (1975) MR0378841
M. Schäl, Estimation and control in discounted stochastic dynamic programming, Preprint No. 428, Institute for Applied Math., University of Bonn, Bonn 1981. (1981) MR0875814
H. C. Tijms, On dynamic programming with arbitrary state space, compact action space and the average reward as criterion, Report BW 55/75, Mathematisch Centrum, Amsterdam 1975. (1975)
T. Ueno, Some limit theorems for temporally discrete Markov processes, J. Fac. Science, University of Tokyo 7 (1957), 449-462. (1957) Zbl0077.33201 MR0090921
D. J. White, Dynamic programming, Markov chains, and the method of successive approximations, J. Math. Anal. Appl. 6 (1963), 373-376. (1963) MR0148480
P. Mandl, G. Hiibner, Transient phenomena and self-optimizing control of Markov chains, Acta Universitatis Carolinae - Math, et Phys. 26 (1985), 1, 35-51. (1985) MR0830264
A. Hordijk, H. Tijms, A modified form of the iterative method of dynamic programming, Ann. Statist. 3 (1975), 1, 203-208. (1975) Zbl0304.90115 MR0378837

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.