Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost

V. Borkar; S. Associate

Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost

V. Borkar; S. Associate

Applicationes Mathematicae (1998)

Volume: 25, Issue: 3, page 339-358
ISSN: 1233-7234

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

This paper considers Bayesian parameter estimation and an associated adaptive control scheme for controlled Markov chains and diffusions with time-averaged cost. Asymptotic behaviour of the posterior law of the parameter given the observed trajectory is analyzed. This analysis suggests a "cost-biased" estimation scheme and associated self-tuning adaptive control. This is shown to be asymptotically optimal in the almost sure sense.

How to cite

top

MLA
BibTeX
RIS

Borkar, V., and Associate, S.. "Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost." Applicationes Mathematicae 25.3 (1998): 339-358. <http://eudml.org/doc/219208>.

@article{Borkar1998,
abstract = {This paper considers Bayesian parameter estimation and an associated adaptive control scheme for controlled Markov chains and diffusions with time-averaged cost. Asymptotic behaviour of the posterior law of the parameter given the observed trajectory is analyzed. This analysis suggests a "cost-biased" estimation scheme and associated self-tuning adaptive control. This is shown to be asymptotically optimal in the almost sure sense.},
author = {Borkar, V., Associate, S.},
journal = {Applicationes Mathematicae},
keywords = {time-averaged cost; adaptive control; asymptotic optimality; cost-biased estimate; Bayesian estimation},
language = {eng},
number = {3},
pages = {339-358},
title = {Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost},
url = {http://eudml.org/doc/219208},
volume = {25},
year = {1998},
}

TY - JOUR
AU - Borkar, V.
AU - Associate, S.
TI - Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost
JO - Applicationes Mathematicae
PY - 1998
VL - 25
IS - 3
SP - 339
EP - 358
AB - This paper considers Bayesian parameter estimation and an associated adaptive control scheme for controlled Markov chains and diffusions with time-averaged cost. Asymptotic behaviour of the posterior law of the parameter given the observed trajectory is analyzed. This analysis suggests a "cost-biased" estimation scheme and associated self-tuning adaptive control. This is shown to be asymptotically optimal in the almost sure sense.
LA - eng
KW - time-averaged cost; adaptive control; asymptotic optimality; cost-biased estimate; Bayesian estimation
UR - http://eudml.org/doc/219208
ER -

References

top

[1] R. Agrawal, D. Teneketzis and V. Anantharam, Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space, IEEE Trans. Automatic Control AC-34 (1989), 1249-1259. Zbl0689.93039
[2] A. Barron, Are Bayes rules consistent in information?, in: Problems in Communication and Computation, T. M. Cover and B. Gopinath (eds.), Springer, New York, 1987, 85-91.
[3] R. N. Bhattacharya, Asymptotic behaviour of several dimensional diffusions, in: Stochastic Nonlinear Systems, L. Arnold and R. Lefever (eds.), Springer, New York, 1981, 86-91.
[4] D. Blackwell and L. Dubins, Merging of opinions with increasing information, Ann. Math. Statist. 33 (1962), 882-887. Zbl0109.35704
[5] V. S. Borkar, Control of Markov chains with long run average cost criterion, in: Stochastic Differential Systems, Stochastic Control Theory and Applications, W. H. Fleming and P. L. Lions (eds.), Springer, New York, 1987, 57-77.
[6] V. S. Borkar, The Kumar-Becker-Lin scheme revisited, J. Optim. Theory Appl. 66 (1990), 289-309. Zbl0682.93060
[7] V. S. Borkar, Self-tuning control of diffusions without the identifiability condition, ibid. 68 (1991), 117-137. Zbl0697.93036
[8] V. S. Borkar, On the Milito-Cruz adaptive control scheme for Markov chains, ibid. 77 (1993), 387-397. Zbl0791.93055
[9] V. S. Borkar, A modified self-tuner for controlled diffusions with an unknown parameter, in: Mathematical Theory of Control (Bombay, 1990), A. V. Balakrishnan and M. C. Joshi (eds.), Marcel Dekker, 1992, 57-67. Zbl0790.93082
[10] V. S. Borkar and M. K. Ghosh, Ergodic and adaptive control of nearest neighbour motions, Math. Control Signals and Systems 4 (1991), 81-98. Zbl0736.93078
[11] V. S. Borkar and M. K. Ghosh, Ergodic control of multidimensional diffusions II: adaptive control, Appl. Math. Optim. 21 (1990), 191-220. Zbl0691.93027
[12] V. S. Borkar and P. P. Varaiya, Identification and adaptive control of Markov chains I: finite parameter case, IEEE Trans. Automatic Control 24 (1979), 953-957. Zbl0416.93065
[13] V. S. Borkar and P. P. Varaiya, Identification and adaptive control of Markov chains, SIAM J. Control Optim. 20 (1982), 470-488. Zbl0491.93063
[14] E. K. P. Chong and P. J. Ramadge, Stochastic optimization of regenerative systems using infinitesimal perturbation analysis, IEEE Trans. Automatic Control 39 (1994), 1400-1410. Zbl0806.93058
[15] Y. S. Chow and H. Teicher, Probability Theory: Independence, Interchangeability, Martingales, Springer, New York, 1979.
[16] G. B. Di Masi and Ł. Stettner, Bayesian ergodic adaptive control of discrete time Markov processes, Stochastics Stochastic Reports 54 (1995), 301-316. Zbl0855.93103
[17] B. Doshi and S. E. Shreve, Randomized self-tuning control of Markov chains, J. Appl. Probab. 17 (1980), 726-734. Zbl0442.93054
[18] B. Hajek, Hitting-time and occupation-time bounds implied by drift analysis with applications, Adv. Appl. Probab. 14 (1982), 502-525. Zbl0495.60094
[19] P. R. Kumar and A. Becker, A new family of optimal adaptive controllers for Markov chains, IEEE Trans. Automatic Control 27 (1982), 137-142. Zbl0471.93069
[20] P. R. Kumar and W. Lin, Optimal adaptive controllers for Markov chains, ibid. 27 (1982), 756-774. Zbl0488.93036
[21] P. R. Kumar and P. P. Varaiya, Stochastic Systems--Estimation, Identification and Adaptive Control, Prentice-Hall, 1986. Zbl0706.93057
[22] P. Mandl, Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974), 40-60. Zbl0281.60070
[23] R. Milito and J. B. Cruz, Jr., An optimization oriented approach to adaptive control of Markov chains, IEEE Trans. Automatic Control 32 (1987), 754-762. Zbl0632.93080
[24] J. N. Tsitsiklis, Asynchronous stochastic approaximation and Q-learning, Machine Learning 16 (1994), 195-202. Zbl0820.68105
[25] K. Van Hee, Bayesian Control of Markov Chains, Math. Center Tracts, 95, Math. Center, Amsterdam, 1978.

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.