Bayesian estimation of the mean holding time in average semi-Markov control processes
We consider semi-Markov control models with Borel state and action spaces, possibly unbounded costs, and holding times with a generalized exponential distribution with unknown mean θ. Assuming that such a distribution does not depend on the state-action pairs, we introduce a Bayesian estimation procedure for θ, which combined with a variant of the vanishing discount factor approach yields average cost optimal policies.