On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms

Ewa Drabik

Applicationes Mathematicae (1996)

  • Volume: 23, Issue: 4, page 449-473
  • ISSN: 1233-7234

Abstract

top
Two kinds of strategies for a multiarmed Markov bandit problem with controlled arms are considered: a strategy with forcing and a strategy with randomization. The choice of arm and control function in both cases is based on the current value of the average cost per unit time functional. Some simulation results are also presented.

How to cite

top

Drabik, Ewa. "On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms." Applicationes Mathematicae 23.4 (1996): 449-473. <http://eudml.org/doc/219145>.

@article{Drabik1996,
abstract = {Two kinds of strategies for a multiarmed Markov bandit problem with controlled arms are considered: a strategy with forcing and a strategy with randomization. The choice of arm and control function in both cases is based on the current value of the average cost per unit time functional. Some simulation results are also presented.},
author = {Drabik, Ewa},
journal = {Applicationes Mathematicae},
keywords = {selfoptimizing strategies; adaptative control; invariant measure; multiarmed bandit; stochastic control; adaptive control; multiarmed Markov bandit problem},
language = {eng},
number = {4},
pages = {449-473},
title = {On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms},
url = {http://eudml.org/doc/219145},
volume = {23},
year = {1996},
}

TY - JOUR
AU - Drabik, Ewa
TI - On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms
JO - Applicationes Mathematicae
PY - 1996
VL - 23
IS - 4
SP - 449
EP - 473
AB - Two kinds of strategies for a multiarmed Markov bandit problem with controlled arms are considered: a strategy with forcing and a strategy with randomization. The choice of arm and control function in both cases is based on the current value of the average cost per unit time functional. Some simulation results are also presented.
LA - eng
KW - selfoptimizing strategies; adaptative control; invariant measure; multiarmed bandit; stochastic control; adaptive control; multiarmed Markov bandit problem
UR - http://eudml.org/doc/219145
ER -

References

top
  1. [1] R. Agrawal, Minimizing the learning loss in adaptative control of Markov chains under the weak accessibility condition, J. Appl. Probab. 28 (1991), 779-790. Zbl0741.60070
  2. [2] R. Agrawal and D. Teneketzis, Certainty equivalence control with forcing: revisited, Systems Control Lett. 13 (1989), 405-412. 
  3. [3] V. Anantharam, P. Varaiya and J. Warland, Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays - Part I: i.i.d. rewards, IEEE Trans. Automat. Control AC-32 (11) (1987), 969-977. Zbl0632.93067
  4. [4] V. Anantharam, P. Varaiya and J. Warland, Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays - Part II: Markovian rewards, ibid., 977-983. Zbl0639.93053
  5. [5] W. Feller, An Introduction to Probability Theory and its Applications, Vol. II, Wiley, New York, 1966. Zbl0138.10207
  6. [6] J. C. Gittins, Multi-armed Bandit Allocation Indices, Wiley, 1989. 
  7. [7] K. D. Glazebrook, On a sufficient condition for superprocesses due to Whittle, J. Appl. Probab. 19 (1982), 99-110. Zbl0484.90091
  8. [8] O. Hernández-Lerma, Adaptative Markov Control Processes, Springer, 1989. 
  9. [9] Ł. Stettner, On nearly self-optimizing strategies for a discrete-time uniformly ergodic adaptative model, Appl. Math. Optim. 27 (1993), 161-177. Zbl0769.93084

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.