EuDML | Browse

Page 1

Displaying 1 – 5 of 5

Bayesian estimation of the mean holding time in average semi-Markov control processes

J. Adolfo Minjárez-Sosa, José A. Montoya (2015)

Applicationes Mathematicae

We consider semi-Markov control models with Borel state and action spaces, possibly unbounded costs, and holding times with a generalized exponential distribution with unknown mean θ. Assuming that such a distribution does not depend on the state-action pairs, we introduce a Bayesian estimation procedure for θ, which combined with a variant of the vanishing discount factor approach yields average cost optimal policies.

Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost

V. Borkar, S. Associate (1998)

Applicationes Mathematicae

This paper considers Bayesian parameter estimation and an associated adaptive control scheme for controlled Markov chains and diffusions with time-averaged cost. Asymptotic behaviour of the posterior law of the parameter given the observed trajectory is analyzed. This analysis suggests a "cost-biased" estimation scheme and associated self-tuning adaptive control. This is shown to be asymptotically optimal in the almost sure sense.

Bottom-up learning of hierarchical models in a class of deterministic POMDP environments

Hideaki Itoh, Hisao Fukumoto, Hiroshi Wakuya, Tatsuya Furukawa (2015)

International Journal of Applied Mathematics and Computer Science

The theory of partially observable Markov decision processes (POMDPs) is a useful tool for developing various intelligent agents, and learning hierarchical POMDP models is one of the key approaches for building such agents when the environments of the agents are unknown and large. To learn hierarchical models, bottom-up learning methods in which learning takes place in a layer-by-layer manner from the lowest to the highest layer are already extensively used in some research fields such as hidden...

Bounds on discrete dynamic programming recursions. II. Polynomial bounds on problems with block-triangular structure

Karel Sladký (1981)

Kybernetika

Brève communication. Théorème de conservation des clients dans les files d'attente

B. Lemaire (1978)

RAIRO - Operations Research - Recherche Opérationnelle