Partially observable Markov decision processes with partially observable random discount factors

E. Everardo Martinez-Garcia; J. Adolfo Minjárez-Sosa; Oscar Vega-Amaya

Displaying similar documents to “Partially observable Markov decision processes with partially observable random discount factors”

Time-discretization for controlled Markov processes. I. General approximation results

Nico M. van Dijk, Arie Hordijk (1996)

Kybernetika

Similarity:

Estimates of stability of Markov control processes with unbounded costs

Evgueni I. Gordienko, Francisco Salem-Silva (2000)

Kybernetika

Similarity:

For a discrete-time Markov control process with the transition probability $p$ , we compare the total discounted costs $V_{β}$ $(π_{β})$ and $V_{β} ({\tilde{π}}_{β})$ , when applying the optimal control policy $π_{β}$ and its approximation ${\tilde{π}}_{β}$ . The policy ${\tilde{π}}_{β}$ is optimal for an approximating process with the transition probability $\tilde{p}$ . A cost per stage for considered processes can be unbounded. Under certain ergodicity assumptions we establish the upper bound for the relative stability index $[V_{β} ({\tilde{π}}_{β}) - V_{β} (π_{β})] / V_{β} (π_{β})$ . This bound does not depend...

Deterministic optimal policies for Markov control processes with pathwise constraints

Armando F. Mendoza-Pérez, Onésimo Hernández-Lerma (2012)

Applicationes Mathematicae

Similarity:

This paper deals with discrete-time Markov control processes in Borel spaces with unbounded rewards. Under suitable hypotheses, we show that a randomized stationary policy is optimal for a certain expected constrained problem (ECP) if and only if it is optimal for the corresponding pathwise constrained problem (pathwise CP). Moreover, we show that a certain parametric family of unconstrained optimality equations yields convergence properties that lead to an approximation scheme which...

Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion

Evgueni I. Gordienko, J. Adolfo Minjárez-Sosa (1998)

Kybernetika

Similarity:

We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations $x_{t + 1} = F (x_{t}, a_{t}, ξ_{t}), t = 0, 1, ...$ with i.i.d. $ℜ^{k}$ -valued random vectors $ξ_{t}$ whose density $ρ$ is unknown. Assuming observability of $ξ_{t}$ we propose the procedure of statistical estimation of $ρ$ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded...