Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach
R. Israel Ortega-Gutiérrez; Raúl Montes-de-Oca; Enrique Lemus-Rodríguez
Kybernetika (2016)
- Volume: 52, Issue: 1, page 66-75
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topOrtega-Gutiérrez, R. Israel, Montes-de-Oca, Raúl, and Lemus-Rodríguez, Enrique. "Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach." Kybernetika 52.1 (2016): 66-75. <http://eudml.org/doc/276771>.
@article{Ortega2016,
abstract = {Many examples in optimization, ranging from Linear Programming to Markov Decision Processes (MDPs), present more than one optimal solution. The study of this non-uniqueness is of great mathematical interest. In this paper the authors show that in a specific family of discounted MDPs, non-uniqueness is a “fragile” property through Ekeland's Principle for each problem with at least two optimal policies; a perturbed model is produced with a unique optimal policy. This result not only supersedes previous papers on the subject, but it also renews the interest in the corresponding questions of well-posedness, genericity and structural stability of MDPs.},
author = {Ortega-Gutiérrez, R. Israel, Montes-de-Oca, Raúl, Lemus-Rodríguez, Enrique},
journal = {Kybernetika},
keywords = {discounted Markov decision processes; dynamic programming; unique optimal policy; non-uniqueness of optimal policies; Ekeland's variational principle},
language = {eng},
number = {1},
pages = {66-75},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach},
url = {http://eudml.org/doc/276771},
volume = {52},
year = {2016},
}
TY - JOUR
AU - Ortega-Gutiérrez, R. Israel
AU - Montes-de-Oca, Raúl
AU - Lemus-Rodríguez, Enrique
TI - Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach
JO - Kybernetika
PY - 2016
PB - Institute of Information Theory and Automation AS CR
VL - 52
IS - 1
SP - 66
EP - 75
AB - Many examples in optimization, ranging from Linear Programming to Markov Decision Processes (MDPs), present more than one optimal solution. The study of this non-uniqueness is of great mathematical interest. In this paper the authors show that in a specific family of discounted MDPs, non-uniqueness is a “fragile” property through Ekeland's Principle for each problem with at least two optimal policies; a perturbed model is produced with a unique optimal policy. This result not only supersedes previous papers on the subject, but it also renews the interest in the corresponding questions of well-posedness, genericity and structural stability of MDPs.
LA - eng
KW - discounted Markov decision processes; dynamic programming; unique optimal policy; non-uniqueness of optimal policies; Ekeland's variational principle
UR - http://eudml.org/doc/276771
ER -
References
top- Bertsekas, D. P., Dynamic Programming: Deterministic and Stochastic Models., Prentice-Hall, NJ 1987. Zbl0649.93001MR0896902
- Bishop, E., Phelps, R. R., 10.1090/pspum/007/0154092, In: Proc. Sympos. Pure Math. Vol. VII, 1963 (V. L. Klee, ed.), Amer. Math. Soc., pp. 27-35. Zbl0149.08601MR0154092DOI10.1090/pspum/007/0154092
- Borwein, J. M., Zhu, Q. J., Techniques of Variational Analysis., Springer, New York 2005. Zbl1076.49001MR2144010
- Cruz-Suárez, D., Montes-de-Oca, R., Salem-Silva, F., 10.1007/s001860400372, Math. Methods Oper. Res. 60 (2004), 415-436. Zbl1104.90053MR2106092DOI10.1007/s001860400372
- Cruz-Suárez, D., Montes-de-Oca, R., Uniform convergence of the value iteration policies for discounted Markov decision processes., Bol. Soc. Mat. Mexicana 12 (2006), 133-152. MR2301750
- Ekeland, I., 10.1016/0022-247x(74)90025-0, J. Math. Anal. Appl. 67 (1974), 324-353. Zbl0286.49015MR0346619DOI10.1016/0022-247x(74)90025-0
- Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-1-4612-0729-0, Springer-Verlag, New York 1996. Zbl0840.93001MR1363487DOI10.1007/978-1-4612-0729-0
- Lucchetti, R., 10.1007/0-387-31082-7, CMS Books in Mathematics, Springer, New York 2006. Zbl1106.49001MR2179578DOI10.1007/0-387-31082-7
- Montes-de-Oca, R., Lemus-Rodríguez, E., An unbounded Berge's minimum theorem with applications to discounted Markov decision processes., Kybernetika 48 (2012), 268-286. Zbl1275.90124MR2954325
- Montes-de-Oca, R., Lemus-Rodríguez, E., Salem-Silva, F., 10.1155/2013/271279, J. Appl. Math. 2013 (2013), 1-5. Zbl1266.90113MR3039713DOI10.1155/2013/271279
- Rockafellar, R. T., Wets, R. J. B., Variational Analysis., Springer, New York 2004. Zbl0888.49001MR1491362
- Tanaka, K., Hosino, M., Kuroiwa, D., On an -optimal policy of discrete time stochastic control processes., Bull. Inform. Cybernet. 27 (1995), 107-119. MR1335274
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.