Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach

R. Israel Ortega-Gutiérrez; Raúl Montes-de-Oca; Enrique Lemus-Rodríguez

Kybernetika (2016)

  • Volume: 52, Issue: 1, page 66-75
  • ISSN: 0023-5954

Abstract

top
Many examples in optimization, ranging from Linear Programming to Markov Decision Processes (MDPs), present more than one optimal solution. The study of this non-uniqueness is of great mathematical interest. In this paper the authors show that in a specific family of discounted MDPs, non-uniqueness is a “fragile” property through Ekeland's Principle for each problem with at least two optimal policies; a perturbed model is produced with a unique optimal policy. This result not only supersedes previous papers on the subject, but it also renews the interest in the corresponding questions of well-posedness, genericity and structural stability of MDPs.

How to cite

top

Ortega-Gutiérrez, R. Israel, Montes-de-Oca, Raúl, and Lemus-Rodríguez, Enrique. "Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach." Kybernetika 52.1 (2016): 66-75. <http://eudml.org/doc/276771>.

@article{Ortega2016,
abstract = {Many examples in optimization, ranging from Linear Programming to Markov Decision Processes (MDPs), present more than one optimal solution. The study of this non-uniqueness is of great mathematical interest. In this paper the authors show that in a specific family of discounted MDPs, non-uniqueness is a “fragile” property through Ekeland's Principle for each problem with at least two optimal policies; a perturbed model is produced with a unique optimal policy. This result not only supersedes previous papers on the subject, but it also renews the interest in the corresponding questions of well-posedness, genericity and structural stability of MDPs.},
author = {Ortega-Gutiérrez, R. Israel, Montes-de-Oca, Raúl, Lemus-Rodríguez, Enrique},
journal = {Kybernetika},
keywords = {discounted Markov decision processes; dynamic programming; unique optimal policy; non-uniqueness of optimal policies; Ekeland's variational principle},
language = {eng},
number = {1},
pages = {66-75},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach},
url = {http://eudml.org/doc/276771},
volume = {52},
year = {2016},
}

TY - JOUR
AU - Ortega-Gutiérrez, R. Israel
AU - Montes-de-Oca, Raúl
AU - Lemus-Rodríguez, Enrique
TI - Uniqueness of optimal policies as a generic property of discounted Markov decision processes: Ekeland's variational principle approach
JO - Kybernetika
PY - 2016
PB - Institute of Information Theory and Automation AS CR
VL - 52
IS - 1
SP - 66
EP - 75
AB - Many examples in optimization, ranging from Linear Programming to Markov Decision Processes (MDPs), present more than one optimal solution. The study of this non-uniqueness is of great mathematical interest. In this paper the authors show that in a specific family of discounted MDPs, non-uniqueness is a “fragile” property through Ekeland's Principle for each problem with at least two optimal policies; a perturbed model is produced with a unique optimal policy. This result not only supersedes previous papers on the subject, but it also renews the interest in the corresponding questions of well-posedness, genericity and structural stability of MDPs.
LA - eng
KW - discounted Markov decision processes; dynamic programming; unique optimal policy; non-uniqueness of optimal policies; Ekeland's variational principle
UR - http://eudml.org/doc/276771
ER -

References

top
  1. Bertsekas, D. P., Dynamic Programming: Deterministic and Stochastic Models., Prentice-Hall, NJ 1987. Zbl0649.93001MR0896902
  2. Bishop, E., Phelps, R. R., 10.1090/pspum/007/0154092, In: Proc. Sympos. Pure Math. Vol. VII, 1963 (V. L. Klee, ed.), Amer. Math. Soc., pp. 27-35. Zbl0149.08601MR0154092DOI10.1090/pspum/007/0154092
  3. Borwein, J. M., Zhu, Q. J., Techniques of Variational Analysis., Springer, New York 2005. Zbl1076.49001MR2144010
  4. Cruz-Suárez, D., Montes-de-Oca, R., Salem-Silva, F., 10.1007/s001860400372, Math. Methods Oper. Res. 60 (2004), 415-436. Zbl1104.90053MR2106092DOI10.1007/s001860400372
  5. Cruz-Suárez, D., Montes-de-Oca, R., Uniform convergence of the value iteration policies for discounted Markov decision processes., Bol. Soc. Mat. Mexicana 12 (2006), 133-152. MR2301750
  6. Ekeland, I., 10.1016/0022-247x(74)90025-0, J. Math. Anal. Appl. 67 (1974), 324-353. Zbl0286.49015MR0346619DOI10.1016/0022-247x(74)90025-0
  7. Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-1-4612-0729-0, Springer-Verlag, New York 1996. Zbl0840.93001MR1363487DOI10.1007/978-1-4612-0729-0
  8. Lucchetti, R., 10.1007/0-387-31082-7, CMS Books in Mathematics, Springer, New York 2006. Zbl1106.49001MR2179578DOI10.1007/0-387-31082-7
  9. Montes-de-Oca, R., Lemus-Rodríguez, E., An unbounded Berge's minimum theorem with applications to discounted Markov decision processes., Kybernetika 48 (2012), 268-286. Zbl1275.90124MR2954325
  10. Montes-de-Oca, R., Lemus-Rodríguez, E., Salem-Silva, F., 10.1155/2013/271279, J. Appl. Math. 2013 (2013), 1-5. Zbl1266.90113MR3039713DOI10.1155/2013/271279
  11. Rockafellar, R. T., Wets, R. J. B., Variational Analysis., Springer, New York 2004. Zbl0888.49001MR1491362
  12. Tanaka, K., Hosino, M., Kuroiwa, D., On an ε -optimal policy of discrete time stochastic control processes., Bull. Inform. Cybernet. 27 (1995), 107-119. MR1335274

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.