Calculating the variance in Markov-processes with random reward.
Trabajos de Estadística e Investigación Operativa (1982)
- Volume: 33, Issue: 3, page 73-85
- ISSN: 0041-0241
Access Full Article
topAbstract
topHow to cite
topBenito, Francisco. "Calculating the variance in Markov-processes with random reward.." Trabajos de Estadística e Investigación Operativa 33.3 (1982): 73-85. <http://eudml.org/doc/40668>.
@article{Benito1982,
abstract = {In this article we present a generalization of Markov Decision Processes with discreet time where the immediate rewards in every period are not deterministic but random, with the two first moments of the distribution given.Formulas are developed to calculate the expected value and the variance of the reward of the process, formulas which generalize and partially correct other results. We make some observations about the distribution of rewards for processes with limited or unlimited horizon and with or without discounting.Application with risk sensitivity policies are possible; this is illustrated in a numerical example where the results are revalidated by simulation.},
author = {Benito, Francisco},
journal = {Trabajos de Estadística e Investigación Operativa},
keywords = {Programación dinámica; Procesos de decisión; Proceso de Markov; random reward; finite Markov decision processes; mean; variance; finite- horizon; total discounted reward; infinite horizon},
language = {eng},
number = {3},
pages = {73-85},
title = {Calculating the variance in Markov-processes with random reward.},
url = {http://eudml.org/doc/40668},
volume = {33},
year = {1982},
}
TY - JOUR
AU - Benito, Francisco
TI - Calculating the variance in Markov-processes with random reward.
JO - Trabajos de Estadística e Investigación Operativa
PY - 1982
VL - 33
IS - 3
SP - 73
EP - 85
AB - In this article we present a generalization of Markov Decision Processes with discreet time where the immediate rewards in every period are not deterministic but random, with the two first moments of the distribution given.Formulas are developed to calculate the expected value and the variance of the reward of the process, formulas which generalize and partially correct other results. We make some observations about the distribution of rewards for processes with limited or unlimited horizon and with or without discounting.Application with risk sensitivity policies are possible; this is illustrated in a numerical example where the results are revalidated by simulation.
LA - eng
KW - Programación dinámica; Procesos de decisión; Proceso de Markov; random reward; finite Markov decision processes; mean; variance; finite- horizon; total discounted reward; infinite horizon
UR - http://eudml.org/doc/40668
ER -
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.