Displaying similar documents to “Identification of optimal policies in Markov decision processes”

A stopping rule for discounted Markov decision processes with finite action sets

Raúl Montes-de-Oca, Enrique Lemus-Rodríguez, Daniel Cruz-Suárez (2009)

Kybernetika

Similarity:

In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound...