Estimation of hidden Markov models for a partially observed risk sensitive control problem

Bernard Frankpitt; John S. Baras

Displaying similar documents to “Estimation of hidden Markov models for a partially observed risk sensitive control problem”

Recursive self-tuning control of finite Markov chains

Vivek Borkar (1997)

Applicationes Mathematicae

Similarity:

A recursive self-tuning control scheme for finite Markov chains is proposed wherein the unknown parameter is estimated by a stochastic approximation scheme for maximizing the log-likelihood function and the control is obtained via a relative value iteration algorithm. The analysis uses the asymptotic o.d.e.s associated with these.

Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost

V. Borkar, S. Associate (1998)

Applicationes Mathematicae

Similarity:

This paper considers Bayesian parameter estimation and an associated adaptive control scheme for controlled Markov chains and diffusions with time-averaged cost. Asymptotic behaviour of the posterior law of the parameter given the observed trajectory is analyzed. This analysis suggests a "cost-biased" estimation scheme and associated self-tuning adaptive control. This is shown to be asymptotically optimal in the almost sure sense.

Time-discretization for controlled Markov processes. I. General approximation results

Nico M. van Dijk, Arie Hordijk (1996)

Kybernetika

Similarity:

Risk-sensitive control of stochastic hybrid systems on infinite time horizon.

Runolfsson, Thordur (2000)

Mathematical Problems in Engineering

Similarity:

Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion

Evgueni I. Gordienko, J. Adolfo Minjárez-Sosa (1998)

Kybernetika

Similarity:

We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations $x_{t + 1} = F (x_{t}, a_{t}, ξ_{t}), t = 0, 1, ...$ with i.i.d. $ℜ^{k}$ -valued random vectors $ξ_{t}$ whose density $ρ$ is unknown. Assuming observability of $ξ_{t}$ we propose the procedure of statistical estimation of $ρ$ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded...