A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs

Óscar Vega-Amaya; Joaquín López-Borbón

Kybernetika (2019)

  • Volume: 55, Issue: 1, page 81-113
  • ISSN: 0023-5954

Abstract

top
The present paper studies the approximate value iteration (AVI) algorithm for the average cost criterion with bounded costs and Borel spaces. It is shown the convergence of the algorithm and provided a performance bound assuming that the model satisfies a standard continuity-compactness assumption and a uniform ergodicity condition. This is done for the class of approximation procedures that can be represented by linear positive operators which give exact representation of constant functions and also satisfy certain continuity property. The main point is that these operators define transition probabilities on the state space of the controlled system. This has the following important consequences: (a) the approximating function is the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be seen as a perturbation of the original Markov model; (c) the perturbed model inherits the ergodicity properties imposed on the original Markov model. These facts allow to bound the AVI algorithm performance in terms of the accuracy of the approximations given by this kind of operators for the primitive data model, namely, the one-step reward function and the system transition law. The bounds are given in terms of the supremum norm of bounded functions and the total variation norm of finite-signed measures. The results are illustrated with numerical approximations for a class of single item inventory systems with linear order cost, no set-up cost and no back-orders.

How to cite

top

Vega-Amaya, Óscar, and López-Borbón, Joaquín. "A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs." Kybernetika 55.1 (2019): 81-113. <http://eudml.org/doc/294175>.

@article{Vega2019,
abstract = {The present paper studies the approximate value iteration (AVI) algorithm for the average cost criterion with bounded costs and Borel spaces. It is shown the convergence of the algorithm and provided a performance bound assuming that the model satisfies a standard continuity-compactness assumption and a uniform ergodicity condition. This is done for the class of approximation procedures that can be represented by linear positive operators which give exact representation of constant functions and also satisfy certain continuity property. The main point is that these operators define transition probabilities on the state space of the controlled system. This has the following important consequences: (a) the approximating function is the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be seen as a perturbation of the original Markov model; (c) the perturbed model inherits the ergodicity properties imposed on the original Markov model. These facts allow to bound the AVI algorithm performance in terms of the accuracy of the approximations given by this kind of operators for the primitive data model, namely, the one-step reward function and the system transition law. The bounds are given in terms of the supremum norm of bounded functions and the total variation norm of finite-signed measures. The results are illustrated with numerical approximations for a class of single item inventory systems with linear order cost, no set-up cost and no back-orders.},
author = {Vega-Amaya, Óscar, López-Borbón, Joaquín},
journal = {Kybernetika},
keywords = {Markov decision processes; average cost criterion; approximate value iteration algorithm; contraction and non-expansive operators; perturbed Markov decision models},
language = {eng},
number = {1},
pages = {81-113},
publisher = {Institute of Information Theory and Automation AS CR},
title = {A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs},
url = {http://eudml.org/doc/294175},
volume = {55},
year = {2019},
}

TY - JOUR
AU - Vega-Amaya, Óscar
AU - López-Borbón, Joaquín
TI - A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs
JO - Kybernetika
PY - 2019
PB - Institute of Information Theory and Automation AS CR
VL - 55
IS - 1
SP - 81
EP - 113
AB - The present paper studies the approximate value iteration (AVI) algorithm for the average cost criterion with bounded costs and Borel spaces. It is shown the convergence of the algorithm and provided a performance bound assuming that the model satisfies a standard continuity-compactness assumption and a uniform ergodicity condition. This is done for the class of approximation procedures that can be represented by linear positive operators which give exact representation of constant functions and also satisfy certain continuity property. The main point is that these operators define transition probabilities on the state space of the controlled system. This has the following important consequences: (a) the approximating function is the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be seen as a perturbation of the original Markov model; (c) the perturbed model inherits the ergodicity properties imposed on the original Markov model. These facts allow to bound the AVI algorithm performance in terms of the accuracy of the approximations given by this kind of operators for the primitive data model, namely, the one-step reward function and the system transition law. The bounds are given in terms of the supremum norm of bounded functions and the total variation norm of finite-signed measures. The results are illustrated with numerical approximations for a class of single item inventory systems with linear order cost, no set-up cost and no back-orders.
LA - eng
KW - Markov decision processes; average cost criterion; approximate value iteration algorithm; contraction and non-expansive operators; perturbed Markov decision models
UR - http://eudml.org/doc/294175
ER -

References

top
  1. Abounadi, J., Bertsekas, D., Borkar, V. S., 10.1137/s0363012999361974, SIAM J. Control Optim. 40 (2001), 681-698. MR1871450DOI10.1137/s0363012999361974
  2. Aliprantis, C. D., Border, K. C., Infinite Dimensional Analysis. Third edition., Springer-Verlag, Berlin 2006. MR2378491
  3. Almudevar, A., 10.1137/040614384, SIAM J. Control Optim. 46 (2008), 541-561. MR2448464DOI10.1137/040614384
  4. Araposthatis, A., Borkar, V. S., Fernández-Guacherand, E., Gosh, M. K., Marcus, S. I., 10.1137/0331018, SIAM J. Control Optim. 31 (1993) 282-344. MR1205981DOI10.1137/0331018
  5. Beutel, L., Gonska, H., Kacsó, D., On variation-diminishing Shoenberg operators: new quantitative statements., In: Multivariate Approximation and Interpolations with Applications (M. Gasca, ed.), Monografías de la Academia de Ciencias de Zaragoza No. 20 2002, pp. 9-58. MR1966063
  6. Bertsekas, D. P., Dynamic Programming: Deterministic and Stochastic Models., Prentice-Hall, Englewood Cliffs NJ 1987. Zbl0649.93001MR0896902
  7. Bertsekas, D. P., Tsitsiklis, J. N., Neuro-Dynamic Programming., Athena Scientific, Belmont 1996. Zbl0924.68163MR3444832
  8. Bertsekas, D. P., 10.1007/s11768-011-1005-3, J. Control Theory Appl. 9 (2011), 310-335. MR2833999DOI10.1007/s11768-011-1005-3
  9. H., Chang, §., Marcus, S. I., 10.1016/s0022-247x(03)00506-7, J. Math. Anal. Appl. 286 (2003), 636-651. MR2008853DOI10.1016/s0022-247x(03)00506-7
  10. Chang, H. S., Hu, J., Fu, M. C., Marcus, S. I., 10.1007/978-1-4471-5022-0, Springer-Verlag, London 2013. MR3052425DOI10.1007/978-1-4471-5022-0
  11. Cooper, W. L., Henderson, S. G., Lewis, M. E., 10.1017/s0269964803172051, Prob. Eng. Inform. Sci. 17 (2003), 213-234. MR1961537DOI10.1017/s0269964803172051
  12. DeVore, R. A., 10.1007/bfb0059493, Lectures Notes in Mathematics 293. Springer-Verlag, Berlin, Heidelberg 1972. MR0420083DOI10.1007/bfb0059493
  13. Dorea, C. C. Y., Pereira, A. G. C., 10.1007/s10474-006-0023-y, Acta Math. Hungar. 110, Issue 4, (2006), 287-292. MR2213230DOI10.1007/s10474-006-0023-y
  14. Prieto-Rumeau, F. Dufour adn T., 10.1016/j.jmaa.2011.11.015, J. Math. Anal. Appl. 388 (2012), 1254-1267. MR2869823DOI10.1016/j.jmaa.2011.11.015
  15. Dufour, F., Prieto-Rumeau, T., 10.1016/j.jmaa.2013.12.016, J. Math. Anal. Appl. 413 (2014), 856-879. MR3159809DOI10.1016/j.jmaa.2013.12.016
  16. Dufour, F., Prieto-Rumeau, T., 10.1080/17442508.2014.939979, Stochastics 87 (2015), 273-307. MR3316812DOI10.1080/17442508.2014.939979
  17. Farias, D. P. de, Roy, B. van, 10.1023/a:1004641123405, J. Optim. Theory Appl. 105 (2000), 589-608. MR1783879DOI10.1023/a:1004641123405
  18. Farias, D. P. de, Roy, B. van, Approximate linear programming for average-cots dynamic programming., In: Advances in Neural Information Processing Systems 15 (S. Becker, S. Thrun and K. Obermayer, eds.), MIT Press, Cambridge MA 2002, pp. 1587-1594. 
  19. Farias, D. P. de, Roy, B. Van, 10.1287/moor.1060.0208, Math. Oper. Res. 31 (2006), 597-620. MR2254426DOI10.1287/moor.1060.0208
  20. Gordon, G. J., 10.1016/b978-1-55860-377-6.50040-2, In: Proc. Twelfth International Conference on Machine Learning (A. Prieditis and S. J. Russell, eds.), Tahoe City CA 1995, pp. 261-268. DOI10.1016/b978-1-55860-377-6.50040-2
  21. Gosavi, A., 10.1023/b:mach.0000019802.64038.6c, Machine Learning 55 (2004), 5-29. MR2549123DOI10.1023/b:mach.0000019802.64038.6c
  22. Hernández-Lerma, O., 10.1007/978-1-4419-8714-3, Springer-Verlag, NY 1989. Zbl0677.93073MR0995463DOI10.1007/978-1-4419-8714-3
  23. Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-1-4612-0729-0, Springer-Verlag, NY 1996. Zbl0840.93001MR1363487DOI10.1007/978-1-4612-0729-0
  24. Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-1-4612-0561-6, Springer-Verlag, NY 1999. Zbl0928.93002MR1697198DOI10.1007/978-1-4612-0561-6
  25. Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-3-0348-8024-4, Birkhauser Verlag, Basel 2003. Zbl1036.60003MR1974383DOI10.1007/978-3-0348-8024-4
  26. Hernández-Lerma, O., Montes-de-Oca, R., Cavazos-Cadena, R., 10.1007/bf02055573, Ann. Oper. Res. 29 (1991), 29-46. MR1105165DOI10.1007/bf02055573
  27. Hernández-Lerma, O., Vega-Amaya, O., Carrasco, G., 10.1137/s0363012998340673, SIAM J. Control Optim. 38 (1999), 79-93. Zbl0951.93074MR1740606DOI10.1137/s0363012998340673
  28. Jaskiewicz, A., Nowak, A. S., 10.1016/j.jmaa.2005.04.065, J. Math. Anal. Appl. 316 (2006), 495-509. MR2206685DOI10.1016/j.jmaa.2005.04.065
  29. Klein, E., Thompson, A. C., Theory of Correspondences., Wiley, New York 1984. MR0752692
  30. Konda, V. R., Tsitsiklis, J. N., 10.1137/s0363012901385691, SIAM J. Control Optim. 42 (2003), 1143-1166. MR2044789DOI10.1137/s0363012901385691
  31. Lee, J. M., Lee, J. H., Approximate dynamic programming strategies and their applicability for process control: a review and future direction., Int. J. Control Automat. Systems 2 (2004), 263-278. 
  32. Meyn, S. P., Tweedie, R. L., 10.1007/978-1-4471-3267-7, Springer-Verlag, London 1993. MR1287609DOI10.1007/978-1-4471-3267-7
  33. Montes-de-Oca, R., Lemus-Rodríguez, E., An unbounded Berge's minimum theorem with applications to discounted Markov decision processes., Kybernetika 48 (2012), 268-286. Zbl1275.90124MR2954325
  34. Munos, R., 10.1137/s0363012904441520, SIAM J. Control Optim. 47 (2007), 2303-2347. MR2309039DOI10.1137/s0363012904441520
  35. Nowak, A. S., 10.4064/am-25-3-295-299, Appl. Math. 25 (1998), 295-299. MR1637730DOI10.4064/am-25-3-295-299
  36. Ortner, R., 10.1007/978-3-540-75225-7_30, In: Algorithmic Learning Theory LNAI 4754 (M. Hutter, R. A. Serveido and E. Takimoto, eds.), Springer, Berlin, Heidelberg 2007, pp. 373-387. DOI10.1007/978-3-540-75225-7_30
  37. Puterman, M. L., 10.1002/9780470316887, Wiley, NY 1994. Zbl1184.90170MR1270015DOI10.1002/9780470316887
  38. Powell, W. P., 10.1002/9780470182963, John Wiley and Sons Inc., 2007. MR2347698DOI10.1002/9780470182963
  39. Powell, W. P., 10.1002/nav.20347, Naval Res. Logist. 56 (2009), 239-249. MR2503223DOI10.1002/nav.20347
  40. Powell, W. P., 10.1007/s10479-012-1077-6, Ann. Oper. Res. 241 (2012), 319-356. MR3509421DOI10.1007/s10479-012-1077-6
  41. Powell, W. P., Ma, J., 10.1007/s11768-011-0313-y, J. Control Theory Appl. 9 (2011), 336-352. MR2834000DOI10.1007/s11768-011-0313-y
  42. Robles-Alcaraz, M. T., Vega-Amaya, O., Minjárez-Sosa, J. Adolfo, 10.3233/rda-160116, Risk Decision Anal. 6 (2017), 79-95. DOI10.3233/rda-160116
  43. Rust, J., 10.1016/s1574-0021(96)01016-7, In: Handbook of Computational Economics, Vol. 13 (H. Amman, D. Kendrick and J. Rust, eds.), North-Holland, Amsterdam 1996, pp. 619-728. MR1416619DOI10.1016/s1574-0021(96)01016-7
  44. Santos, M. S., 10.2307/2998564, Econometrica 66 (1998), 409-426. MR1612175DOI10.2307/2998564
  45. Santos, M. S., Rust, J., 10.1137/s0363012902399824, SIAM J. Control Optim. 42 (2004) 2094-2115. MR2080931DOI10.1137/s0363012902399824
  46. Saldi, N., Yuksel, S., Linder, T., 10.1287/moor.2016.0832, Math. Oper. Res. 42 (2017), 945-978. MR3722422DOI10.1287/moor.2016.0832
  47. Stachurski, J., 10.1007/s10614-007-9111-5, Comput. Economics 31 (2008), 141-160. DOI10.1007/s10614-007-9111-5
  48. Sutton, R. S., Barto, A. G., 10.1108/k.1998.27.9.1093.3, MIT Press, Cambridge MA 1998. DOI10.1108/k.1998.27.9.1093.3
  49. Roy, B. Van, 10.1287/moor.1060.0188, Math. Oper. Res. 31 (2006), 234-244. MR2233994DOI10.1287/moor.1060.0188
  50. Vega-Amaya, O., The average cost optimality equation: a fixed-point approach., Bol. Soc. Mat. Mexicana 9 (2003), 185-195. MR1988598
  51. Vega-Amaya, O., 10.1137/s0363012902408423, SIAM J. Control Optim. 42 (2003), 1876-1894. MR2046390DOI10.1137/s0363012902408423
  52. Vega-Amaya, O., 10.1016/j.jmaa.2018.03.077, J. Math. Anal. Appl. 464 (2018), 152-163. MR3794081DOI10.1016/j.jmaa.2018.03.077
  53. Vega-Amaya, O., López-Borbón, J., 10.3934/jdg.2016014, J. Dyn. Games 3 (2016), 261-278. MR3562878DOI10.3934/jdg.2016014
  54. Montes-de-Oca, O. Vega-Amaya amd R., 10.1007/bf01198405, Math. Methods Oper. Res. 47 (1998) 451-471. MR1637569DOI10.1007/bf01198405

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.