A stopping rule for discounted Markov decision processes with finite action sets

Raúl Montes-de-Oca; Enrique Lemus-Rodríguez; Daniel Cruz-Suárez

Kybernetika (2009)

  • Volume: 45, Issue: 5, page 755-767
  • ISSN: 0023-5954

Abstract

top
In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only on structural properties of the Markov Decision Process, under mild standard conditions and an additional "individuality" condition, which is of interest in its own. It should be mentioned that other authors find such kind of constants using non-structural information, i.e., information not immediately apparent from the Decision Process itself. The DMDP is required to fulfill an ergodicity condition and the corresponding ergodicity index plays a critical role in the upper bound.

How to cite

top

Montes-de-Oca, Raúl, Lemus-Rodríguez, Enrique, and Cruz-Suárez, Daniel. "A stopping rule for discounted Markov decision processes with finite action sets." Kybernetika 45.5 (2009): 755-767. <http://eudml.org/doc/37701>.

@article{Montes2009,
abstract = {In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only on structural properties of the Markov Decision Process, under mild standard conditions and an additional "individuality" condition, which is of interest in its own. It should be mentioned that other authors find such kind of constants using non-structural information, i.e., information not immediately apparent from the Decision Process itself. The DMDP is required to fulfill an ergodicity condition and the corresponding ergodicity index plays a critical role in the upper bound.},
author = {Montes-de-Oca, Raúl, Lemus-Rodríguez, Enrique, Cruz-Suárez, Daniel},
journal = {Kybernetika},
keywords = {Markov decision process; ergodicity condition; value iteration; discounted cost; optimal policy; myopic policies; discounted cost; Markov decision process; ergodicity condition; value iteration; optimal policy; myopic policies},
language = {eng},
number = {5},
pages = {755-767},
publisher = {Institute of Information Theory and Automation AS CR},
title = {A stopping rule for discounted Markov decision processes with finite action sets},
url = {http://eudml.org/doc/37701},
volume = {45},
year = {2009},
}

TY - JOUR
AU - Montes-de-Oca, Raúl
AU - Lemus-Rodríguez, Enrique
AU - Cruz-Suárez, Daniel
TI - A stopping rule for discounted Markov decision processes with finite action sets
JO - Kybernetika
PY - 2009
PB - Institute of Information Theory and Automation AS CR
VL - 45
IS - 5
SP - 755
EP - 767
AB - In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only on structural properties of the Markov Decision Process, under mild standard conditions and an additional "individuality" condition, which is of interest in its own. It should be mentioned that other authors find such kind of constants using non-structural information, i.e., information not immediately apparent from the Decision Process itself. The DMDP is required to fulfill an ergodicity condition and the corresponding ergodicity index plays a critical role in the upper bound.
LA - eng
KW - Markov decision process; ergodicity condition; value iteration; discounted cost; optimal policy; myopic policies; discounted cost; Markov decision process; ergodicity condition; value iteration; optimal policy; myopic policies
UR - http://eudml.org/doc/37701
ER -

References

top
  1. Uniform convergence of the value iteration policies for discounted Markov decision processes, Bol. Soc. Mat. Mexicana 12 (2006), 133–148. MR2301750
  2. Conditions for the uniqueness of discounted Markov decision processes, Math. Methods Oper. Res. 60 (2004), 415–436. MR2106092
  3. Uniform approximations of discounted Markov decision processes to optimal policies, In: Proc. Prague Stochastics 2006 (M. Hušková and M. Janžura, eds.), MATFYZPRESS, Prague 2006, pp. 278–287. 
  4. [unknown], O. Hernández-Lerma: Adaptive Markov Control Processes Springer-Verlag, New York 1989. MR0995463
  5. Discrete–Time Markov Control Processes: Basic Optimality Criteria, Springer-Verlag, New York 1996. MR1363487
  6. Further Topics on Discrete–Time Markov Control Processes, Springer-Verlag, New York 1999. MR1697198
  7. Markov Decision Processes, Discrete Stochastic Dynamic Programming. Wiley, New York 1994. Zbl1184.90170MR1270015
  8. Optimal stationary policies in general state Markov decision chains with finite action sets, Math. Oper. Res. 17 (1992), 901–909. MR1196400
  9. Recursive Methods in Economic Dynamics, Harvard University Press, USA 1989. MR1105087

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.