Estimation and control in finite Markov decision processes with the average reward criterion

Rolando Cavazos-Cadena; Raúl Montes-de-Oca

Applicationes Mathematicae (2004)

  • Volume: 31, Issue: 2, page 127-154
  • ISSN: 1233-7234

Abstract

top
This work concerns Markov decision chains with finite state and action sets. The transition law satisfies the simultaneous Doeblin condition but is unknown to the controller, and the problem of determining an optimal adaptive policy with respect to the average reward criterion is addressed. A subset of policies is identified so that, when the system evolves under a policy in that class, the frequency estimators of the transition law are consistent on an essential set of admissible state-action pairs, and the non-stationary value iteration scheme is used to select an optimal adaptive policy within that family.

How to cite

top

Rolando Cavazos-Cadena, and Raúl Montes-de-Oca. "Estimation and control in finite Markov decision processes with the average reward criterion." Applicationes Mathematicae 31.2 (2004): 127-154. <http://eudml.org/doc/279704>.

@article{RolandoCavazos2004,
abstract = {This work concerns Markov decision chains with finite state and action sets. The transition law satisfies the simultaneous Doeblin condition but is unknown to the controller, and the problem of determining an optimal adaptive policy with respect to the average reward criterion is addressed. A subset of policies is identified so that, when the system evolves under a policy in that class, the frequency estimators of the transition law are consistent on an essential set of admissible state-action pairs, and the non-stationary value iteration scheme is used to select an optimal adaptive policy within that family.},
author = {Rolando Cavazos-Cadena, Raúl Montes-de-Oca},
journal = {Applicationes Mathematicae},
keywords = {optimality equation; Schweitzer's transformation; convergence of non-stationary successive approximations; strong law of large numbers},
language = {eng},
number = {2},
pages = {127-154},
title = {Estimation and control in finite Markov decision processes with the average reward criterion},
url = {http://eudml.org/doc/279704},
volume = {31},
year = {2004},
}

TY - JOUR
AU - Rolando Cavazos-Cadena
AU - Raúl Montes-de-Oca
TI - Estimation and control in finite Markov decision processes with the average reward criterion
JO - Applicationes Mathematicae
PY - 2004
VL - 31
IS - 2
SP - 127
EP - 154
AB - This work concerns Markov decision chains with finite state and action sets. The transition law satisfies the simultaneous Doeblin condition but is unknown to the controller, and the problem of determining an optimal adaptive policy with respect to the average reward criterion is addressed. A subset of policies is identified so that, when the system evolves under a policy in that class, the frequency estimators of the transition law are consistent on an essential set of admissible state-action pairs, and the non-stationary value iteration scheme is used to select an optimal adaptive policy within that family.
LA - eng
KW - optimality equation; Schweitzer's transformation; convergence of non-stationary successive approximations; strong law of large numbers
UR - http://eudml.org/doc/279704
ER -

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.