A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs

Óscar Vega-Amaya; Joaquín López-Borbón

A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs

Óscar Vega-Amaya; Joaquín López-Borbón

Kybernetika (2019)

Volume: 55, Issue: 1, page 81-113
ISSN: 0023-5954

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

The present paper studies the approximate value iteration (AVI) algorithm for the average cost criterion with bounded costs and Borel spaces. It is shown the convergence of the algorithm and provided a performance bound assuming that the model satisfies a standard continuity-compactness assumption and a uniform ergodicity condition. This is done for the class of approximation procedures that can be represented by linear positive operators which give exact representation of constant functions and also satisfy certain continuity property. The main point is that these operators define transition probabilities on the state space of the controlled system. This has the following important consequences: (a) the approximating function is the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be seen as a perturbation of the original Markov model; (c) the perturbed model inherits the ergodicity properties imposed on the original Markov model. These facts allow to bound the AVI algorithm performance in terms of the accuracy of the approximations given by this kind of operators for the primitive data model, namely, the one-step reward function and the system transition law. The bounds are given in terms of the supremum norm of bounded functions and the total variation norm of finite-signed measures. The results are illustrated with numerical approximations for a class of single item inventory systems with linear order cost, no set-up cost and no back-orders.

How to cite

top

MLA
BibTeX
RIS

Vega-Amaya, Óscar, and López-Borbón, Joaquín. "A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs." Kybernetika 55.1 (2019): 81-113. <http://eudml.org/doc/294175>.

@article{Vega2019,
abstract = {The present paper studies the approximate value iteration (AVI) algorithm for the average cost criterion with bounded costs and Borel spaces. It is shown the convergence of the algorithm and provided a performance bound assuming that the model satisfies a standard continuity-compactness assumption and a uniform ergodicity condition. This is done for the class of approximation procedures that can be represented by linear positive operators which give exact representation of constant functions and also satisfy certain continuity property. The main point is that these operators define transition probabilities on the state space of the controlled system. This has the following important consequences: (a) the approximating function is the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be seen as a perturbation of the original Markov model; (c) the perturbed model inherits the ergodicity properties imposed on the original Markov model. These facts allow to bound the AVI algorithm performance in terms of the accuracy of the approximations given by this kind of operators for the primitive data model, namely, the one-step reward function and the system transition law. The bounds are given in terms of the supremum norm of bounded functions and the total variation norm of finite-signed measures. The results are illustrated with numerical approximations for a class of single item inventory systems with linear order cost, no set-up cost and no back-orders.},
author = {Vega-Amaya, Óscar, López-Borbón, Joaquín},
journal = {Kybernetika},
keywords = {Markov decision processes; average cost criterion; approximate value iteration algorithm; contraction and non-expansive operators; perturbed Markov decision models},
language = {eng},
number = {1},
pages = {81-113},
publisher = {Institute of Information Theory and Automation AS CR},
title = {A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs},
url = {http://eudml.org/doc/294175},
volume = {55},
year = {2019},
}

TY - JOUR
AU - Vega-Amaya, Óscar
AU - López-Borbón, Joaquín
TI - A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs
JO - Kybernetika
PY - 2019
PB - Institute of Information Theory and Automation AS CR
VL - 55
IS - 1
SP - 81
EP - 113
AB - The present paper studies the approximate value iteration (AVI) algorithm for the average cost criterion with bounded costs and Borel spaces. It is shown the convergence of the algorithm and provided a performance bound assuming that the model satisfies a standard continuity-compactness assumption and a uniform ergodicity condition. This is done for the class of approximation procedures that can be represented by linear positive operators which give exact representation of constant functions and also satisfy certain continuity property. The main point is that these operators define transition probabilities on the state space of the controlled system. This has the following important consequences: (a) the approximating function is the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be seen as a perturbation of the original Markov model; (c) the perturbed model inherits the ergodicity properties imposed on the original Markov model. These facts allow to bound the AVI algorithm performance in terms of the accuracy of the approximations given by this kind of operators for the primitive data model, namely, the one-step reward function and the system transition law. The bounds are given in terms of the supremum norm of bounded functions and the total variation norm of finite-signed measures. The results are illustrated with numerical approximations for a class of single item inventory systems with linear order cost, no set-up cost and no back-orders.
LA - eng
KW - Markov decision processes; average cost criterion; approximate value iteration algorithm; contraction and non-expansive operators; perturbed Markov decision models
UR - http://eudml.org/doc/294175
ER -

References

top

Abounadi, J., Bertsekas, D., Borkar, V. S., 10.1137/s0363012999361974, SIAM J. Control Optim. 40 (2001), 681-698. MR1871450 DOI10.1137/s0363012999361974
Aliprantis, C. D., Border, K. C., Infinite Dimensional Analysis. Third edition., Springer-Verlag, Berlin 2006. MR2378491
Almudevar, A., 10.1137/040614384, SIAM J. Control Optim. 46 (2008), 541-561. MR2448464 DOI10.1137/040614384
Araposthatis, A., Borkar, V. S., Fernández-Guacherand, E., Gosh, M. K., Marcus, S. I., 10.1137/0331018, SIAM J. Control Optim. 31 (1993) 282-344. MR1205981 DOI10.1137/0331018
Beutel, L., Gonska, H., Kacsó, D., On variation-diminishing Shoenberg operators: new quantitative statements., In: Multivariate Approximation and Interpolations with Applications (M. Gasca, ed.), Monografías de la Academia de Ciencias de Zaragoza No. 20 2002, pp. 9-58. MR1966063
Bertsekas, D. P., Dynamic Programming: Deterministic and Stochastic Models., Prentice-Hall, Englewood Cliffs NJ 1987. Zbl0649.93001 MR0896902
Bertsekas, D. P., Tsitsiklis, J. N., Neuro-Dynamic Programming., Athena Scientific, Belmont 1996. Zbl0924.68163 MR3444832
Bertsekas, D. P., 10.1007/s11768-011-1005-3, J. Control Theory Appl. 9 (2011), 310-335. MR2833999 DOI10.1007/s11768-011-1005-3
H., Chang, §., Marcus, S. I., 10.1016/s0022-247x(03)00506-7, J. Math. Anal. Appl. 286 (2003), 636-651. MR2008853 DOI10.1016/s0022-247x(03)00506-7
Chang, H. S., Hu, J., Fu, M. C., Marcus, S. I., 10.1007/978-1-4471-5022-0, Springer-Verlag, London 2013. MR3052425 DOI10.1007/978-1-4471-5022-0
Cooper, W. L., Henderson, S. G., Lewis, M. E., 10.1017/s0269964803172051, Prob. Eng. Inform. Sci. 17 (2003), 213-234. MR1961537 DOI10.1017/s0269964803172051
DeVore, R. A., 10.1007/bfb0059493, Lectures Notes in Mathematics 293. Springer-Verlag, Berlin, Heidelberg 1972. MR0420083 DOI10.1007/bfb0059493
Dorea, C. C. Y., Pereira, A. G. C., 10.1007/s10474-006-0023-y, Acta Math. Hungar. 110, Issue 4, (2006), 287-292. MR2213230 DOI10.1007/s10474-006-0023-y
Prieto-Rumeau, F. Dufour adn T., 10.1016/j.jmaa.2011.11.015, J. Math. Anal. Appl. 388 (2012), 1254-1267. MR2869823 DOI10.1016/j.jmaa.2011.11.015
Dufour, F., Prieto-Rumeau, T., 10.1016/j.jmaa.2013.12.016, J. Math. Anal. Appl. 413 (2014), 856-879. MR3159809 DOI10.1016/j.jmaa.2013.12.016
Dufour, F., Prieto-Rumeau, T., 10.1080/17442508.2014.939979, Stochastics 87 (2015), 273-307. MR3316812 DOI10.1080/17442508.2014.939979
Farias, D. P. de, Roy, B. van, 10.1023/a:1004641123405, J. Optim. Theory Appl. 105 (2000), 589-608. MR1783879 DOI10.1023/a:1004641123405
Farias, D. P. de, Roy, B. van, Approximate linear programming for average-cots dynamic programming., In: Advances in Neural Information Processing Systems 15 (S. Becker, S. Thrun and K. Obermayer, eds.), MIT Press, Cambridge MA 2002, pp. 1587-1594.
Farias, D. P. de, Roy, B. Van, 10.1287/moor.1060.0208, Math. Oper. Res. 31 (2006), 597-620. MR2254426 DOI10.1287/moor.1060.0208
Gordon, G. J., 10.1016/b978-1-55860-377-6.50040-2, In: Proc. Twelfth International Conference on Machine Learning (A. Prieditis and S. J. Russell, eds.), Tahoe City CA 1995, pp. 261-268. DOI10.1016/b978-1-55860-377-6.50040-2
Gosavi, A., 10.1023/b:mach.0000019802.64038.6c, Machine Learning 55 (2004), 5-29. MR2549123 DOI10.1023/b:mach.0000019802.64038.6c
Hernández-Lerma, O., 10.1007/978-1-4419-8714-3, Springer-Verlag, NY 1989. Zbl0677.93073 MR0995463 DOI10.1007/978-1-4419-8714-3
Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-1-4612-0729-0, Springer-Verlag, NY 1996. Zbl0840.93001 MR1363487 DOI10.1007/978-1-4612-0729-0
Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-1-4612-0561-6, Springer-Verlag, NY 1999. Zbl0928.93002 MR1697198 DOI10.1007/978-1-4612-0561-6
Hernández-Lerma, O., Lasserre, J. B., 10.1007/978-3-0348-8024-4, Birkhauser Verlag, Basel 2003. Zbl1036.60003 MR1974383 DOI10.1007/978-3-0348-8024-4
Hernández-Lerma, O., Montes-de-Oca, R., Cavazos-Cadena, R., 10.1007/bf02055573, Ann. Oper. Res. 29 (1991), 29-46. MR1105165 DOI10.1007/bf02055573
Hernández-Lerma, O., Vega-Amaya, O., Carrasco, G., 10.1137/s0363012998340673, SIAM J. Control Optim. 38 (1999), 79-93. Zbl0951.93074 MR1740606 DOI10.1137/s0363012998340673
Jaskiewicz, A., Nowak, A. S., 10.1016/j.jmaa.2005.04.065, J. Math. Anal. Appl. 316 (2006), 495-509. MR2206685 DOI10.1016/j.jmaa.2005.04.065
Klein, E., Thompson, A. C., Theory of Correspondences., Wiley, New York 1984. MR0752692
Konda, V. R., Tsitsiklis, J. N., 10.1137/s0363012901385691, SIAM J. Control Optim. 42 (2003), 1143-1166. MR2044789 DOI10.1137/s0363012901385691
Lee, J. M., Lee, J. H., Approximate dynamic programming strategies and their applicability for process control: a review and future direction., Int. J. Control Automat. Systems 2 (2004), 263-278.
Meyn, S. P., Tweedie, R. L., 10.1007/978-1-4471-3267-7, Springer-Verlag, London 1993. MR1287609 DOI10.1007/978-1-4471-3267-7
Montes-de-Oca, R., Lemus-Rodríguez, E., An unbounded Berge's minimum theorem with applications to discounted Markov decision processes., Kybernetika 48 (2012), 268-286. Zbl1275.90124 MR2954325
Munos, R., 10.1137/s0363012904441520, SIAM J. Control Optim. 47 (2007), 2303-2347. MR2309039 DOI10.1137/s0363012904441520
Nowak, A. S., 10.4064/am-25-3-295-299, Appl. Math. 25 (1998), 295-299. MR1637730 DOI10.4064/am-25-3-295-299
Ortner, R., 10.1007/978-3-540-75225-7_30, In: Algorithmic Learning Theory LNAI 4754 (M. Hutter, R. A. Serveido and E. Takimoto, eds.), Springer, Berlin, Heidelberg 2007, pp. 373-387. DOI10.1007/978-3-540-75225-7_30
Puterman, M. L., 10.1002/9780470316887, Wiley, NY 1994. Zbl1184.90170 MR1270015 DOI10.1002/9780470316887
Powell, W. P., 10.1002/9780470182963, John Wiley and Sons Inc., 2007. MR2347698 DOI10.1002/9780470182963
Powell, W. P., 10.1002/nav.20347, Naval Res. Logist. 56 (2009), 239-249. MR2503223 DOI10.1002/nav.20347
Powell, W. P., 10.1007/s10479-012-1077-6, Ann. Oper. Res. 241 (2012), 319-356. MR3509421 DOI10.1007/s10479-012-1077-6
Powell, W. P., Ma, J., 10.1007/s11768-011-0313-y, J. Control Theory Appl. 9 (2011), 336-352. MR2834000 DOI10.1007/s11768-011-0313-y
Robles-Alcaraz, M. T., Vega-Amaya, O., Minjárez-Sosa, J. Adolfo, 10.3233/rda-160116, Risk Decision Anal. 6 (2017), 79-95. DOI10.3233/rda-160116
Rust, J., 10.1016/s1574-0021(96)01016-7, In: Handbook of Computational Economics, Vol. 13 (H. Amman, D. Kendrick and J. Rust, eds.), North-Holland, Amsterdam 1996, pp. 619-728. MR1416619 DOI10.1016/s1574-0021(96)01016-7
Santos, M. S., 10.2307/2998564, Econometrica 66 (1998), 409-426. MR1612175 DOI10.2307/2998564
Santos, M. S., Rust, J., 10.1137/s0363012902399824, SIAM J. Control Optim. 42 (2004) 2094-2115. MR2080931 DOI10.1137/s0363012902399824
Saldi, N., Yuksel, S., Linder, T., 10.1287/moor.2016.0832, Math. Oper. Res. 42 (2017), 945-978. MR3722422 DOI10.1287/moor.2016.0832
Stachurski, J., 10.1007/s10614-007-9111-5, Comput. Economics 31 (2008), 141-160. DOI10.1007/s10614-007-9111-5
Sutton, R. S., Barto, A. G., 10.1108/k.1998.27.9.1093.3, MIT Press, Cambridge MA 1998. DOI10.1108/k.1998.27.9.1093.3
Roy, B. Van, 10.1287/moor.1060.0188, Math. Oper. Res. 31 (2006), 234-244. MR2233994 DOI10.1287/moor.1060.0188
Vega-Amaya, O., The average cost optimality equation: a fixed-point approach., Bol. Soc. Mat. Mexicana 9 (2003), 185-195. MR1988598
Vega-Amaya, O., 10.1137/s0363012902408423, SIAM J. Control Optim. 42 (2003), 1876-1894. MR2046390 DOI10.1137/s0363012902408423
Vega-Amaya, O., 10.1016/j.jmaa.2018.03.077, J. Math. Anal. Appl. 464 (2018), 152-163. MR3794081 DOI10.1016/j.jmaa.2018.03.077
Vega-Amaya, O., López-Borbón, J., 10.3934/jdg.2016014, J. Dyn. Games 3 (2016), 261-278. MR3562878 DOI10.3934/jdg.2016014
Montes-de-Oca, O. Vega-Amaya amd R., 10.1007/bf01198405, Math. Methods Oper. Res. 47 (1998) 451-471. MR1637569 DOI10.1007/bf01198405

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.