Approximate dynamic programming based on high dimensional model representation

Miroslav Pištěk

Approximate dynamic programming based on high dimensional model representation

Miroslav Pištěk

Kybernetika (2013)

Volume: 49, Issue: 5, page 720-737
ISSN: 0023-5954

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

This article introduces an algorithm for implicit High Dimensional Model Representation (HDMR) of the Bellman equation. This approximation technique reduces memory demands of the algorithm considerably. Moreover, we show that HDMR enables fast approximate minimization which is essential for evaluation of the Bellman function. In each time step, the problem of parametrized HDMR minimization is relaxed into trust region problems, all sharing the same matrix. Finding its eigenvalue decomposition, we effectively achieve estimates of all minima. Their full-domain representation is avoided by HDMR and then the same approach is used recursively in the next time step. An illustrative example of N-armed bandit problem is included. We assume that the newly established connection between approximate HDMR minimization and the trust region problem can be beneficial also to many other applications.

How to cite

top

MLA
BibTeX
RIS

Pištěk, Miroslav. "Approximate dynamic programming based on high dimensional model representation." Kybernetika 49.5 (2013): 720-737. <http://eudml.org/doc/260675>.

@article{Pištěk2013,
abstract = {This article introduces an algorithm for implicit High Dimensional Model Representation (HDMR) of the Bellman equation. This approximation technique reduces memory demands of the algorithm considerably. Moreover, we show that HDMR enables fast approximate minimization which is essential for evaluation of the Bellman function. In each time step, the problem of parametrized HDMR minimization is relaxed into trust region problems, all sharing the same matrix. Finding its eigenvalue decomposition, we effectively achieve estimates of all minima. Their full-domain representation is avoided by HDMR and then the same approach is used recursively in the next time step. An illustrative example of N-armed bandit problem is included. We assume that the newly established connection between approximate HDMR minimization and the trust region problem can be beneficial also to many other applications.},
author = {Pištěk, Miroslav},
journal = {Kybernetika},
keywords = {approximate dynamic programming; Bellman equation; approximate HDMR minimization; trust region problem; approximate dynamic programming; Bellman equation; approximate HDMR minimization; trust region problem},
language = {eng},
number = {5},
pages = {720-737},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Approximate dynamic programming based on high dimensional model representation},
url = {http://eudml.org/doc/260675},
volume = {49},
year = {2013},
}

TY - JOUR
AU - Pištěk, Miroslav
TI - Approximate dynamic programming based on high dimensional model representation
JO - Kybernetika
PY - 2013
PB - Institute of Information Theory and Automation AS CR
VL - 49
IS - 5
SP - 720
EP - 737
AB - This article introduces an algorithm for implicit High Dimensional Model Representation (HDMR) of the Bellman equation. This approximation technique reduces memory demands of the algorithm considerably. Moreover, we show that HDMR enables fast approximate minimization which is essential for evaluation of the Bellman function. In each time step, the problem of parametrized HDMR minimization is relaxed into trust region problems, all sharing the same matrix. Finding its eigenvalue decomposition, we effectively achieve estimates of all minima. Their full-domain representation is avoided by HDMR and then the same approach is used recursively in the next time step. An illustrative example of N-armed bandit problem is included. We assume that the newly established connection between approximate HDMR minimization and the trust region problem can be beneficial also to many other applications.
LA - eng
KW - approximate dynamic programming; Bellman equation; approximate HDMR minimization; trust region problem; approximate dynamic programming; Bellman equation; approximate HDMR minimization; trust region problem
UR - http://eudml.org/doc/260675
ER -

References

top

Busygin, S., A new trust region technique for the maximum weight clique problem., Discrete Appl. Math. 154 (2002), 2006. Zbl1111.90020 MR2257814
Demiralp, M., High dimensional model representation and its application varieties., In: Proc. Fourth International Conference on Tools for Mathematical Modelling, St. Petersburg 2003, pp. 146-159. Zbl1237.93093
Feil, K. S., Shah, N., 10.1002/wilj.18, Wilmott J. 1 (2009), 179-195. DOI10.1002/wilj.18
Garivier, A., Cappé, O., The KL-UCB algorithm for bounded stochastic bandits and beyond., In: Proc. 24th Annual Conference on Learning Theory, Budapest 2011.
George, A., Powell, W. B., Kulkarni, S. R., Value function approximation using multiple aggregation for multiattribute resource management., J. Machine Learning Res. 9 (2008), 2079-2111. Zbl1225.68180
Gittins, J., Bandit processes and dynamic allocation indices., J. Roy. Statist. Soc. Ser.B 41 (1979), 2, 148-177. Zbl0411.62055 MR0547241
Gosavi, A., 10.1287/ijoc.1080.0305, INFORMS J. Comput. 21 (2009), 178-192. Zbl1243.68240 MR2549123 DOI10.1287/ijoc.1080.0305
Hauskrecht, M., Value-function approximations for partially observable markov decision processes., J. Artif. Internat. Res. 13 (2000), 33-94. Zbl0946.68131 MR1781862
Jaakkola, T., Singh, S. P., Jordan, M. I., Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems., MIT Press, 1995.
Karp, R. M., 10.1007/978-1-4684-2001-2_9, In: Complexity of Computer Computations (R. Miller and J. W. Thatcher, eds.), Plenum, New York 1972. Zbl1187.90014 MR0378476 DOI10.1007/978-1-4684-2001-2_9
Kushner, H., Introduction to Stochastic Control., Holt, Rinehart and Winston, New York 1970. Zbl0293.93018 MR0280248
LeBlanc, M., Tibshirani, R., Combining estimates in regression and classification., J. Amer. Statist. Assoc. 91 (1996), 1641-1650. Zbl0881.62046 MR1439105
Li, G., Hu, J., Wang, S.-W., Georgopoulos, P. G., Schoendorf, J., Rabitz, H., 10.1021/jp054148m, J. Phys. Chem. A 110 (2006), 7, 2474-2485. DOI10.1021/jp054148m
Luus, R., Iterative Dynamic Programming., Chapman and Hall/CRC Monographs and Surveys in Pure and Applied Mathematics, 2000. Zbl1158.49038 MR1750212
Ma, X., Zabaras, N., 10.1016/j.jcp.2010.01.033, J. Comput. Phys. 229 (2010), 3884-3915. Zbl1189.65019 MR2609758 DOI10.1016/j.jcp.2010.01.033
Matoušek, J., Nešetřil, J., Invitation to Discrete Mathematics., Clarendon Press, 1998. Zbl1152.05002 MR1668997
Miller, W., Sutton, R., Werbos, P., Neural Networks for Control. Neural Network Modeling and Connectionism., Mit Press, 1995.
Olsson, C., Eriksson, A., Kahl, F., Solving large scale binary quadratic problems: Spectral methods vs. semidefinite programming., In: Computer Vision and Pattern Recognition, 2007.
Peterka, V., Bayesian system identification., In: Trends and Progress in System Identification (P. Eykhoff, ed.), Pergamon Press, Oxford 1981, pp. 239-304. Zbl0451.93059 MR0746139
Pištěk, M., On implicit approximation of the bellman equation., In: 15th IFAC Symposium on System Identification, Saint-Malo 2009.
Powell, W. B., Approximate Dynamic Programming: Solving the Curses of Dimensionality., Wiley-Interscience, 2007. Zbl1242.90002 MR2347698
Rabitz, H., Alis, O., 10.1023/A:1019188517934, J. Math. Chem. 25 (1999), 197-233. Zbl0957.93004 MR1731292 DOI10.1023/A:1019188517934
Rahman, S., 10.1002/nme.2394, Internat. J. Numer. Methods in Engrg. 76 (2008), 2091-2116. Zbl1195.74310 MR2475282 DOI10.1002/nme.2394
Rojas, M., Santos, S. A., Sorensen, D. C., 10.1137/S105262349928887X, SIAM J. Optim. 11 (2000), 611-646. Zbl0994.65067 MR1814035 DOI10.1137/S105262349928887X
Schrijver, A., Theory of Linear and Integer Programming., Wiley-Interscience Series in Discrete Mathematics and Optimization, John Wiley and Sons, 1998. Zbl0970.90052 MR0874114
Sorensen, D. C., 10.1137/0719026, SIAM J. Numer. Anal. 19 (1982), 2, 409-426. Zbl0483.65039 MR0650060 DOI10.1137/0719026
Sutton, R. S., Barto, A. G., Reinforcement Learning: An Introduction., MIT Press, 1998.

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.