Estimation and control in finite Markov decision processes with the average reward criterion
Rolando Cavazos-Cadena; Raúl Montes-de-Oca
Applicationes Mathematicae (2004)
- Volume: 31, Issue: 2, page 127-154
- ISSN: 1233-7234
Access Full Article
topAbstract
topHow to cite
topRolando Cavazos-Cadena, and Raúl Montes-de-Oca. "Estimation and control in finite Markov decision processes with the average reward criterion." Applicationes Mathematicae 31.2 (2004): 127-154. <http://eudml.org/doc/279704>.
@article{RolandoCavazos2004,
abstract = {This work concerns Markov decision chains with finite state and action sets. The transition law satisfies the simultaneous Doeblin condition but is unknown to the controller, and the problem of determining an optimal adaptive policy with respect to the average reward criterion is addressed. A subset of policies is identified so that, when the system evolves under a policy in that class, the frequency estimators of the transition law are consistent on an essential set of admissible state-action pairs, and the non-stationary value iteration scheme is used to select an optimal adaptive policy within that family.},
author = {Rolando Cavazos-Cadena, Raúl Montes-de-Oca},
journal = {Applicationes Mathematicae},
keywords = {optimality equation; Schweitzer's transformation; convergence of non-stationary successive approximations; strong law of large numbers},
language = {eng},
number = {2},
pages = {127-154},
title = {Estimation and control in finite Markov decision processes with the average reward criterion},
url = {http://eudml.org/doc/279704},
volume = {31},
year = {2004},
}
TY - JOUR
AU - Rolando Cavazos-Cadena
AU - Raúl Montes-de-Oca
TI - Estimation and control in finite Markov decision processes with the average reward criterion
JO - Applicationes Mathematicae
PY - 2004
VL - 31
IS - 2
SP - 127
EP - 154
AB - This work concerns Markov decision chains with finite state and action sets. The transition law satisfies the simultaneous Doeblin condition but is unknown to the controller, and the problem of determining an optimal adaptive policy with respect to the average reward criterion is addressed. A subset of policies is identified so that, when the system evolves under a policy in that class, the frequency estimators of the transition law are consistent on an essential set of admissible state-action pairs, and the non-stationary value iteration scheme is used to select an optimal adaptive policy within that family.
LA - eng
KW - optimality equation; Schweitzer's transformation; convergence of non-stationary successive approximations; strong law of large numbers
UR - http://eudml.org/doc/279704
ER -
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.