Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost

V. Borkar; S. Associate

Applicationes Mathematicae (1998)

  • Volume: 25, Issue: 3, page 339-358
  • ISSN: 1233-7234

Abstract

top
This paper considers Bayesian parameter estimation and an associated adaptive control scheme for controlled Markov chains and diffusions with time-averaged cost. Asymptotic behaviour of the posterior law of the parameter given the observed trajectory is analyzed. This analysis suggests a "cost-biased" estimation scheme and associated self-tuning adaptive control. This is shown to be asymptotically optimal in the almost sure sense.

How to cite

top

Borkar, V., and Associate, S.. "Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost." Applicationes Mathematicae 25.3 (1998): 339-358. <http://eudml.org/doc/219208>.

@article{Borkar1998,
abstract = {This paper considers Bayesian parameter estimation and an associated adaptive control scheme for controlled Markov chains and diffusions with time-averaged cost. Asymptotic behaviour of the posterior law of the parameter given the observed trajectory is analyzed. This analysis suggests a "cost-biased" estimation scheme and associated self-tuning adaptive control. This is shown to be asymptotically optimal in the almost sure sense.},
author = {Borkar, V., Associate, S.},
journal = {Applicationes Mathematicae},
keywords = {time-averaged cost; adaptive control; asymptotic optimality; cost-biased estimate; Bayesian estimation},
language = {eng},
number = {3},
pages = {339-358},
title = {Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost},
url = {http://eudml.org/doc/219208},
volume = {25},
year = {1998},
}

TY - JOUR
AU - Borkar, V.
AU - Associate, S.
TI - Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost
JO - Applicationes Mathematicae
PY - 1998
VL - 25
IS - 3
SP - 339
EP - 358
AB - This paper considers Bayesian parameter estimation and an associated adaptive control scheme for controlled Markov chains and diffusions with time-averaged cost. Asymptotic behaviour of the posterior law of the parameter given the observed trajectory is analyzed. This analysis suggests a "cost-biased" estimation scheme and associated self-tuning adaptive control. This is shown to be asymptotically optimal in the almost sure sense.
LA - eng
KW - time-averaged cost; adaptive control; asymptotic optimality; cost-biased estimate; Bayesian estimation
UR - http://eudml.org/doc/219208
ER -

References

top
  1. [1] R. Agrawal, D. Teneketzis and V. Anantharam, Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space, IEEE Trans. Automatic Control AC-34 (1989), 1249-1259. Zbl0689.93039
  2. [2] A. Barron, Are Bayes rules consistent in information?, in: Problems in Communication and Computation, T. M. Cover and B. Gopinath (eds.), Springer, New York, 1987, 85-91. 
  3. [3] R. N. Bhattacharya, Asymptotic behaviour of several dimensional diffusions, in: Stochastic Nonlinear Systems, L. Arnold and R. Lefever (eds.), Springer, New York, 1981, 86-91. 
  4. [4] D. Blackwell and L. Dubins, Merging of opinions with increasing information, Ann. Math. Statist. 33 (1962), 882-887. Zbl0109.35704
  5. [5] V. S. Borkar, Control of Markov chains with long run average cost criterion, in: Stochastic Differential Systems, Stochastic Control Theory and Applications, W. H. Fleming and P. L. Lions (eds.), Springer, New York, 1987, 57-77. 
  6. [6] V. S. Borkar, The Kumar-Becker-Lin scheme revisited, J. Optim. Theory Appl. 66 (1990), 289-309. Zbl0682.93060
  7. [7] V. S. Borkar, Self-tuning control of diffusions without the identifiability condition, ibid. 68 (1991), 117-137. Zbl0697.93036
  8. [8] V. S. Borkar, On the Milito-Cruz adaptive control scheme for Markov chains, ibid. 77 (1993), 387-397. Zbl0791.93055
  9. [9] V. S. Borkar, A modified self-tuner for controlled diffusions with an unknown parameter, in: Mathematical Theory of Control (Bombay, 1990), A. V. Balakrishnan and M. C. Joshi (eds.), Marcel Dekker, 1992, 57-67. Zbl0790.93082
  10. [10] V. S. Borkar and M. K. Ghosh, Ergodic and adaptive control of nearest neighbour motions, Math. Control Signals and Systems 4 (1991), 81-98. Zbl0736.93078
  11. [11] V. S. Borkar and M. K. Ghosh, Ergodic control of multidimensional diffusions II: adaptive control, Appl. Math. Optim. 21 (1990), 191-220. Zbl0691.93027
  12. [12] V. S. Borkar and P. P. Varaiya, Identification and adaptive control of Markov chains I: finite parameter case, IEEE Trans. Automatic Control 24 (1979), 953-957. Zbl0416.93065
  13. [13] V. S. Borkar and P. P. Varaiya, Identification and adaptive control of Markov chains, SIAM J. Control Optim. 20 (1982), 470-488. Zbl0491.93063
  14. [14] E. K. P. Chong and P. J. Ramadge, Stochastic optimization of regenerative systems using infinitesimal perturbation analysis, IEEE Trans. Automatic Control 39 (1994), 1400-1410. Zbl0806.93058
  15. [15] Y. S. Chow and H. Teicher, Probability Theory: Independence, Interchangeability, Martingales, Springer, New York, 1979. 
  16. [16] G. B. Di Masi and Ł. Stettner, Bayesian ergodic adaptive control of discrete time Markov processes, Stochastics Stochastic Reports 54 (1995), 301-316. Zbl0855.93103
  17. [17] B. Doshi and S. E. Shreve, Randomized self-tuning control of Markov chains, J. Appl. Probab. 17 (1980), 726-734. Zbl0442.93054
  18. [18] B. Hajek, Hitting-time and occupation-time bounds implied by drift analysis with applications, Adv. Appl. Probab. 14 (1982), 502-525. Zbl0495.60094
  19. [19] P. R. Kumar and A. Becker, A new family of optimal adaptive controllers for Markov chains, IEEE Trans. Automatic Control 27 (1982), 137-142. Zbl0471.93069
  20. [20] P. R. Kumar and W. Lin, Optimal adaptive controllers for Markov chains, ibid. 27 (1982), 756-774. Zbl0488.93036
  21. [21] P. R. Kumar and P. P. Varaiya, Stochastic Systems--Estimation, Identification and Adaptive Control, Prentice-Hall, 1986. Zbl0706.93057
  22. [22] P. Mandl, Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974), 40-60. Zbl0281.60070
  23. [23] R. Milito and J. B. Cruz, Jr., An optimization oriented approach to adaptive control of Markov chains, IEEE Trans. Automatic Control 32 (1987), 754-762. Zbl0632.93080
  24. [24] J. N. Tsitsiklis, Asynchronous stochastic approaximation and Q-learning, Machine Learning 16 (1994), 195-202. Zbl0820.68105
  25. [25] K. Van Hee, Bayesian Control of Markov Chains, Math. Center Tracts, 95, Math. Center, Amsterdam, 1978. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.