Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion

Evgueni I. Gordienko; J. Adolfo Minjárez-Sosa

Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion

Evgueni I. Gordienko; J. Adolfo Minjárez-Sosa

Kybernetika (1998)

Volume: 34, Issue: 2, page [217]-234
ISSN: 0023-5954

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations

x_{t + 1} = F (x_{t}, a_{t}, ξ_{t}), t = 0, 1, ...

with i.i.d.

ℜ^{k}

-valued random vectors

ξ_{t}

whose density

ρ

is unknown. Assuming observability of

ξ_{t}

we propose the procedure of statistical estimation of

ρ

that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.

How to cite

top

MLA
BibTeX
RIS

Gordienko, Evgueni I., and Minjárez-Sosa, J. Adolfo. "Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion." Kybernetika 34.2 (1998): [217]-234. <http://eudml.org/doc/33349>.

@article{Gordienko1998,
abstract = {We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations $x_\{t+1\}=F(x_t,a_t,\xi _t),\,\,t=0,1,\ldots $ with i.i.d. $\Re ^k$-valued random vectors $\xi _t$ whose density $\rho $ is unknown. Assuming observability of $\xi _t$ we propose the procedure of statistical estimation of $\rho $ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.},
author = {Gordienko, Evgueni I., Minjárez-Sosa, J. Adolfo},
journal = {Kybernetika},
keywords = {Markov control process; unbounded costs; discounted asymptotic optimality; density estimator; rate of convergence; Markov control process; unbounded costs; discounted asymptotic optimality; density estimator; rate of convergence},
language = {eng},
number = {2},
pages = {[217]-234},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion},
url = {http://eudml.org/doc/33349},
volume = {34},
year = {1998},
}

TY - JOUR
AU - Gordienko, Evgueni I.
AU - Minjárez-Sosa, J. Adolfo
TI - Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion
JO - Kybernetika
PY - 1998
PB - Institute of Information Theory and Automation AS CR
VL - 34
IS - 2
SP - [217]
EP - 234
AB - We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations $x_{t+1}=F(x_t,a_t,\xi _t),\,\,t=0,1,\ldots $ with i.i.d. $\Re ^k$-valued random vectors $\xi _t$ whose density $\rho $ is unknown. Assuming observability of $\xi _t$ we propose the procedure of statistical estimation of $\rho $ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.
LA - eng
KW - Markov control process; unbounded costs; discounted asymptotic optimality; density estimator; rate of convergence; Markov control process; unbounded costs; discounted asymptotic optimality; density estimator; rate of convergence
UR - http://eudml.org/doc/33349
ER -

References

top

Agrawal R., 10.2307/3214681, J. Appl. Probab. 28 (1991), 779–790 (1991) Zbl0741.60070 MR1133786 DOI10.2307/3214681
Ash R. B., Real Analysis and Probability, Academic Press, New York 1972 MR0435320
Cavazos–Cadena R., 10.1007/BF01102341, J. Optim. Theory Appl. 65 (1990), 191–207 (1990) MR1051545 DOI10.1007/BF01102341
Dynkin E. B., A A., Yushkevich: Controlled Markov Processes, Springer–Verlag, New York 1979 MR0554083
Fernández–Gaucherand E., Arapostathis A., Marcus S. I., A methodology for the adaptive control of Markov chains under partial state information, In: Proc. of the 1992 Conf. on Information Sci. and Systems, Princeton, New Jersey, pp. 773–775 (1992)
Fernández–Gaucherand E., Arapostathis A., Marcus S. I., 10.1109/9.222316, IEEE Trans. Automat. Control 38 (1993), 987–993 (1993) Zbl0786.93089 MR1227213 DOI10.1109/9.222316
Gordienko E. I., Adaptive strategies for certain classes of controlled Markov processes, Theory Probab. Appl. 29 (1985), 504–518 (1985) Zbl0577.93067
Gordienko E. I., Controlled Markov sequences with slowly varying characteristics II, Adaptive optimal strategies. Soviet J. Comput. Systems Sci. 23 (1985), 87–93 (1985) Zbl0618.93070 MR0844298
Gordienko E. I., Hernández–Lerma O., Average cost Markov control processes with weighted norms: value iteration, Appl. Math. 23 (1995), 219–237 (1995) Zbl0829.93068 MR1341224
Gordienko E. I., Montes–de–Oca R., Minjárez–Sosa J. A., 10.1007/BF01193864, Math. Methods Oper. Res. 45 (1997), 2, to appear (1997) Zbl0882.90127 MR1446409 DOI10.1007/BF01193864
Hasminskii R., Ibragimov I., 10.1214/aos/1176347736, Ann. of Statist. 18 (1990), 999–1010 (1990) Zbl0705.62039 MR1062695 DOI10.1214/aos/1176347736
Hernández–Lerma O., Adaptive Markov Control Processes, Springer–Verlag, New York 1989 Zbl0698.90053 MR0995463
Hernández–Lerma O., Infinite–horizon Markov control processes with undiscounted cost criteria: from average to overtaking optimality, Reporte Interno 165. Departamento de Matemáticas, CINVESTAV-IPN, A.P. 14-740.07000, México, D. F., México (1994). (Submitted for publication) (1994)
Hernández–Lerma O., Cavazos–Cadena R., 10.1007/BF00049572, Acta Appl. Math. 20 (1990), 285–307 (1990) Zbl0717.93066 MR1081591 DOI10.1007/BF00049572
Hernández–Lerma O., Lasserre J. B., Discrete–Time Markov Control Processes, Springer–Verlag, New York 1995 Zbl0928.93002
Hernández–Lerma O., Marcus S. I., 10.1007/BF00938426, J. Optim. Theory Appl. 46 (1985), 227–235 (1985) Zbl0543.90093 MR0794250 DOI10.1007/BF00938426
Hernández–Lerma O., Marcus S. I., 10.1016/0167-6911(87)90055-7, Systems Control Lett. 9 (1987), 307–315 (1987) MR0912683 DOI10.1016/0167-6911(87)90055-7
Hinderer K., Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, (Lecture Notes in Operations Research and Mathematical Systems 33.) Springer–Verlag, Berlin – Heidelberg – New York 1970 Zbl0202.18401 MR0267890
Köthe G., Topological Vector Spaces I, Springer–Verlag, New York 1969 MR0248498
Kumar P. R., Varaiya P., Stochastic Systems: Estimation, Identification and Adaptive Control, Prentice–Hall, Englewood Cliffs 1986 Zbl0706.93057
Lippman S. A., 10.1287/mnsc.21.11.1225, Management Sci. 21 (1975), 1225–1233 (1975) Zbl0309.90017 MR0398535 DOI10.1287/mnsc.21.11.1225
Mandl P., 10.2307/1426206, Adv. in Appl. Probab. 6 (1974), 40–60 (1974) Zbl0281.60070 MR0339876 DOI10.2307/1426206
Rieder U., 10.1007/BF01168566, Manuscripta Math. 24 (1978), 115–131 (1978) Zbl0385.28005 MR0493590 DOI10.1007/BF01168566
Schäl M., 10.1080/17442508708833435, Stochastics 20 (1987), 51–71 (1987) MR0875814 DOI10.1080/17442508708833435
Stettner L., 10.1007/BF01195980, J. Appl. Math. Optim. 27 (1993), 161–177 (1993) Zbl0769.93084 MR1202530 DOI10.1007/BF01195980
Stettner L., Ergodic control of Markov process with mixed observation structure, Dissertationes Math. 341 (1995), 1–36 (1995) MR1318335
Nunen J. A. E. E. van, Wessels J., 10.1287/mnsc.24.5.576, Management Sci. 24 (1978), 576–580 (1978) DOI10.1287/mnsc.24.5.576

Citations in EuDML Documents

top

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.