Approximate dynamic programming based on high dimensional model representation

Miroslav Pištěk

Kybernetika (2013)

  • Volume: 49, Issue: 5, page 720-737
  • ISSN: 0023-5954

Abstract

top
This article introduces an algorithm for implicit High Dimensional Model Representation (HDMR) of the Bellman equation. This approximation technique reduces memory demands of the algorithm considerably. Moreover, we show that HDMR enables fast approximate minimization which is essential for evaluation of the Bellman function. In each time step, the problem of parametrized HDMR minimization is relaxed into trust region problems, all sharing the same matrix. Finding its eigenvalue decomposition, we effectively achieve estimates of all minima. Their full-domain representation is avoided by HDMR and then the same approach is used recursively in the next time step. An illustrative example of N-armed bandit problem is included. We assume that the newly established connection between approximate HDMR minimization and the trust region problem can be beneficial also to many other applications.

How to cite

top

Pištěk, Miroslav. "Approximate dynamic programming based on high dimensional model representation." Kybernetika 49.5 (2013): 720-737. <http://eudml.org/doc/260675>.

@article{Pištěk2013,
abstract = {This article introduces an algorithm for implicit High Dimensional Model Representation (HDMR) of the Bellman equation. This approximation technique reduces memory demands of the algorithm considerably. Moreover, we show that HDMR enables fast approximate minimization which is essential for evaluation of the Bellman function. In each time step, the problem of parametrized HDMR minimization is relaxed into trust region problems, all sharing the same matrix. Finding its eigenvalue decomposition, we effectively achieve estimates of all minima. Their full-domain representation is avoided by HDMR and then the same approach is used recursively in the next time step. An illustrative example of N-armed bandit problem is included. We assume that the newly established connection between approximate HDMR minimization and the trust region problem can be beneficial also to many other applications.},
author = {Pištěk, Miroslav},
journal = {Kybernetika},
keywords = {approximate dynamic programming; Bellman equation; approximate HDMR minimization; trust region problem; approximate dynamic programming; Bellman equation; approximate HDMR minimization; trust region problem},
language = {eng},
number = {5},
pages = {720-737},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Approximate dynamic programming based on high dimensional model representation},
url = {http://eudml.org/doc/260675},
volume = {49},
year = {2013},
}

TY - JOUR
AU - Pištěk, Miroslav
TI - Approximate dynamic programming based on high dimensional model representation
JO - Kybernetika
PY - 2013
PB - Institute of Information Theory and Automation AS CR
VL - 49
IS - 5
SP - 720
EP - 737
AB - This article introduces an algorithm for implicit High Dimensional Model Representation (HDMR) of the Bellman equation. This approximation technique reduces memory demands of the algorithm considerably. Moreover, we show that HDMR enables fast approximate minimization which is essential for evaluation of the Bellman function. In each time step, the problem of parametrized HDMR minimization is relaxed into trust region problems, all sharing the same matrix. Finding its eigenvalue decomposition, we effectively achieve estimates of all minima. Their full-domain representation is avoided by HDMR and then the same approach is used recursively in the next time step. An illustrative example of N-armed bandit problem is included. We assume that the newly established connection between approximate HDMR minimization and the trust region problem can be beneficial also to many other applications.
LA - eng
KW - approximate dynamic programming; Bellman equation; approximate HDMR minimization; trust region problem; approximate dynamic programming; Bellman equation; approximate HDMR minimization; trust region problem
UR - http://eudml.org/doc/260675
ER -

References

top
  1. Busygin, S., A new trust region technique for the maximum weight clique problem., Discrete Appl. Math. 154 (2002), 2006. Zbl1111.90020MR2257814
  2. Demiralp, M., High dimensional model representation and its application varieties., In: Proc. Fourth International Conference on Tools for Mathematical Modelling, St. Petersburg 2003, pp. 146-159. Zbl1237.93093
  3. Feil, K. S., Shah, N., 10.1002/wilj.18, Wilmott J. 1 (2009), 179-195. DOI10.1002/wilj.18
  4. Garivier, A., Cappé, O., The KL-UCB algorithm for bounded stochastic bandits and beyond., In: Proc. 24th Annual Conference on Learning Theory, Budapest 2011. 
  5. George, A., Powell, W. B., Kulkarni, S. R., Value function approximation using multiple aggregation for multiattribute resource management., J. Machine Learning Res. 9 (2008), 2079-2111. Zbl1225.68180
  6. Gittins, J., Bandit processes and dynamic allocation indices., J. Roy. Statist. Soc. Ser.B 41 (1979), 2, 148-177. Zbl0411.62055MR0547241
  7. Gosavi, A., 10.1287/ijoc.1080.0305, INFORMS J. Comput. 21 (2009), 178-192. Zbl1243.68240MR2549123DOI10.1287/ijoc.1080.0305
  8. Hauskrecht, M., Value-function approximations for partially observable markov decision processes., J. Artif. Internat. Res. 13 (2000), 33-94. Zbl0946.68131MR1781862
  9. Jaakkola, T., Singh, S. P., Jordan, M. I., Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems., MIT Press, 1995. 
  10. Karp, R. M., 10.1007/978-1-4684-2001-2_9, In: Complexity of Computer Computations (R. Miller and J. W. Thatcher, eds.), Plenum, New York 1972. Zbl1187.90014MR0378476DOI10.1007/978-1-4684-2001-2_9
  11. Kushner, H., Introduction to Stochastic Control., Holt, Rinehart and Winston, New York 1970. Zbl0293.93018MR0280248
  12. LeBlanc, M., Tibshirani, R., Combining estimates in regression and classification., J. Amer. Statist. Assoc. 91 (1996), 1641-1650. Zbl0881.62046MR1439105
  13. Li, G., Hu, J., Wang, S.-W., Georgopoulos, P. G., Schoendorf, J., Rabitz, H., 10.1021/jp054148m, J. Phys. Chem. A 110 (2006), 7, 2474-2485. DOI10.1021/jp054148m
  14. Luus, R., Iterative Dynamic Programming., Chapman and Hall/CRC Monographs and Surveys in Pure and Applied Mathematics, 2000. Zbl1158.49038MR1750212
  15. Ma, X., Zabaras, N., 10.1016/j.jcp.2010.01.033, J. Comput. Phys. 229 (2010), 3884-3915. Zbl1189.65019MR2609758DOI10.1016/j.jcp.2010.01.033
  16. Matoušek, J., Nešetřil, J., Invitation to Discrete Mathematics., Clarendon Press, 1998. Zbl1152.05002MR1668997
  17. Miller, W., Sutton, R., Werbos, P., Neural Networks for Control. Neural Network Modeling and Connectionism., Mit Press, 1995. 
  18. Olsson, C., Eriksson, A., Kahl, F., Solving large scale binary quadratic problems: Spectral methods vs. semidefinite programming., In: Computer Vision and Pattern Recognition, 2007. 
  19. Peterka, V., Bayesian system identification., In: Trends and Progress in System Identification (P. Eykhoff, ed.), Pergamon Press, Oxford 1981, pp. 239-304. Zbl0451.93059MR0746139
  20. Pištěk, M., On implicit approximation of the bellman equation., In: 15th IFAC Symposium on System Identification, Saint-Malo 2009. 
  21. Powell, W. B., Approximate Dynamic Programming: Solving the Curses of Dimensionality., Wiley-Interscience, 2007. Zbl1242.90002MR2347698
  22. Rabitz, H., Alis, O., 10.1023/A:1019188517934, J. Math. Chem. 25 (1999), 197-233. Zbl0957.93004MR1731292DOI10.1023/A:1019188517934
  23. Rahman, S., 10.1002/nme.2394, Internat. J. Numer. Methods in Engrg. 76 (2008), 2091-2116. Zbl1195.74310MR2475282DOI10.1002/nme.2394
  24. Rojas, M., Santos, S. A., Sorensen, D. C., 10.1137/S105262349928887X, SIAM J. Optim. 11 (2000), 611-646. Zbl0994.65067MR1814035DOI10.1137/S105262349928887X
  25. Schrijver, A., Theory of Linear and Integer Programming., Wiley-Interscience Series in Discrete Mathematics and Optimization, John Wiley and Sons, 1998. Zbl0970.90052MR0874114
  26. Sorensen, D. C., 10.1137/0719026, SIAM J. Numer. Anal. 19 (1982), 2, 409-426. Zbl0483.65039MR0650060DOI10.1137/0719026
  27. Sutton, R. S., Barto, A. G., Reinforcement Learning: An Introduction., MIT Press, 1998. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.