Exact solution of the Bellman equation for a -discounted reward in a two-armed bandit with switching arms.

Donchev, Doncho S.

Displaying similar documents to “Exact solution of the Bellman equation for a $β$ -discounted reward in a two-armed bandit with switching arms.”

On selecting the most reliable components.

Shi, Dylan (1998)

Journal of Applied Mathematics and Decision Sciences

Similarity:

A semimartingale characterization of average optimal stationary policies for Markov decision processes.

Zhu, Quanxin, Guo, Xianping (2006)

Journal of Applied Mathematics and Stochastic Analysis

Similarity:

Solution to the optimality equation in a class of Markov decision chains with the average cost criterion

Rolando Cavazos-Cadena (1991)

Kybernetika

Similarity:

Estimates of stability of Markov control processes with unbounded costs

Evgueni I. Gordienko, Francisco Salem-Silva (2000)

Kybernetika

Similarity:

For a discrete-time Markov control process with the transition probability $p$ , we compare the total discounted costs $V_{β}$ $(π_{β})$ and $V_{β} ({\tilde{π}}_{β})$ , when applying the optimal control policy $π_{β}$ and its approximation ${\tilde{π}}_{β}$ . The policy ${\tilde{π}}_{β}$ is optimal for an approximating process with the transition probability $\tilde{p}$ . A cost per stage for considered processes can be unbounded. Under certain ergodicity assumptions we establish the upper bound for the relative stability index $[V_{β} ({\tilde{π}}_{β}) - V_{β} (π_{β})] / V_{β} (π_{β})$ . This bound does not depend...

Approximation and adaptive control of Markov processes: Average reward criterion

Onésimo Hernández-Lerma (1987)

Kybernetika

Similarity:

Optimal replacement under additive damage and self-restoration

Dror Zuckerman (1980)

RAIRO - Operations Research - Recherche Opérationnelle

Similarity:

Displaying similar documents to “Exact solution of the Bellman equation for a β -discounted reward in a two-armed bandit with switching arms.”

On selecting the most reliable components.

A semimartingale characterization of average optimal stationary policies for Markov decision processes.

Solution to the optimality equation in a class of Markov decision chains with the average cost criterion

Estimates of stability of Markov control processes with unbounded costs

Approximation and adaptive control of Markov processes: Average reward criterion

Optimal replacement under additive damage and self-restoration

Displaying similar documents to “Exact solution of the Bellman equation for a $β$ -discounted reward in a two-armed bandit with switching arms.”