A stopping rule for discounted Markov decision processes with finite action sets
Raúl Montes-de-Oca; Enrique Lemus-Rodríguez; Daniel Cruz-Suárez
Kybernetika (2009)
- Volume: 45, Issue: 5, page 755-767
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topMontes-de-Oca, Raúl, Lemus-Rodríguez, Enrique, and Cruz-Suárez, Daniel. "A stopping rule for discounted Markov decision processes with finite action sets." Kybernetika 45.5 (2009): 755-767. <http://eudml.org/doc/37701>.
@article{Montes2009,
abstract = {In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only on structural properties of the Markov Decision Process, under mild standard conditions and an additional "individuality" condition, which is of interest in its own. It should be mentioned that other authors find such kind of constants using non-structural information, i.e., information not immediately apparent from the Decision Process itself. The DMDP is required to fulfill an ergodicity condition and the corresponding ergodicity index plays a critical role in the upper bound.},
author = {Montes-de-Oca, Raúl, Lemus-Rodríguez, Enrique, Cruz-Suárez, Daniel},
journal = {Kybernetika},
keywords = {Markov decision process; ergodicity condition; value iteration; discounted cost; optimal policy; myopic policies; discounted cost; Markov decision process; ergodicity condition; value iteration; optimal policy; myopic policies},
language = {eng},
number = {5},
pages = {755-767},
publisher = {Institute of Information Theory and Automation AS CR},
title = {A stopping rule for discounted Markov decision processes with finite action sets},
url = {http://eudml.org/doc/37701},
volume = {45},
year = {2009},
}
TY - JOUR
AU - Montes-de-Oca, Raúl
AU - Lemus-Rodríguez, Enrique
AU - Cruz-Suárez, Daniel
TI - A stopping rule for discounted Markov decision processes with finite action sets
JO - Kybernetika
PY - 2009
PB - Institute of Information Theory and Automation AS CR
VL - 45
IS - 5
SP - 755
EP - 767
AB - In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only on structural properties of the Markov Decision Process, under mild standard conditions and an additional "individuality" condition, which is of interest in its own. It should be mentioned that other authors find such kind of constants using non-structural information, i.e., information not immediately apparent from the Decision Process itself. The DMDP is required to fulfill an ergodicity condition and the corresponding ergodicity index plays a critical role in the upper bound.
LA - eng
KW - Markov decision process; ergodicity condition; value iteration; discounted cost; optimal policy; myopic policies; discounted cost; Markov decision process; ergodicity condition; value iteration; optimal policy; myopic policies
UR - http://eudml.org/doc/37701
ER -
References
top- Uniform convergence of the value iteration policies for discounted Markov decision processes, Bol. Soc. Mat. Mexicana 12 (2006), 133–148. MR2301750
- Conditions for the uniqueness of discounted Markov decision processes, Math. Methods Oper. Res. 60 (2004), 415–436. MR2106092
- Uniform approximations of discounted Markov decision processes to optimal policies, In: Proc. Prague Stochastics 2006 (M. Hušková and M. Janžura, eds.), MATFYZPRESS, Prague 2006, pp. 278–287.
- [unknown], O. Hernández-Lerma: Adaptive Markov Control Processes Springer-Verlag, New York 1989. MR0995463
- Discrete–Time Markov Control Processes: Basic Optimality Criteria, Springer-Verlag, New York 1996. MR1363487
- Further Topics on Discrete–Time Markov Control Processes, Springer-Verlag, New York 1999. MR1697198
- Markov Decision Processes, Discrete Stochastic Dynamic Programming. Wiley, New York 1994. Zbl1184.90170MR1270015
- Optimal stationary policies in general state Markov decision chains with finite action sets, Math. Oper. Res. 17 (1992), 901–909. MR1196400
- Recursive Methods in Economic Dynamics, Harvard University Press, USA 1989. MR1105087
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.