Epoch-incremental reinforcement learning algorithms

Roman Zajdel

International Journal of Applied Mathematics and Computer Science (2013)

  • Volume: 23, Issue: 3, page 623-635
  • ISSN: 1641-876X

Abstract

top
In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.

How to cite

top

Roman Zajdel. "Epoch-incremental reinforcement learning algorithms." International Journal of Applied Mathematics and Computer Science 23.3 (2013): 623-635. <http://eudml.org/doc/262322>.

@article{RomanZajdel2013,
abstract = {In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.},
author = {Roman Zajdel},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {reinforcement learning; epoch-incremental algorithm; grid world},
language = {eng},
number = {3},
pages = {623-635},
title = {Epoch-incremental reinforcement learning algorithms},
url = {http://eudml.org/doc/262322},
volume = {23},
year = {2013},
}

TY - JOUR
AU - Roman Zajdel
TI - Epoch-incremental reinforcement learning algorithms
JO - International Journal of Applied Mathematics and Computer Science
PY - 2013
VL - 23
IS - 3
SP - 623
EP - 635
AB - In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.
LA - eng
KW - reinforcement learning; epoch-incremental algorithm; grid world
UR - http://eudml.org/doc/262322
ER -

References

top
  1. Atiya, A.F., Parlos, A.G. and Ingber, L. (2003). A reinforcement learning method based on adaptive simulated annealing, Proceedings of the 46th International Midwest Symposium on Circuits and Systems, Cairo, Egypt, pp. 121-124. 
  2. Barto, A., Sutton, R. and Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning problem, IEEE Transactions on Systems, Man, and Cybernetics 13(5): 834-847. 
  3. Cichosz, P. (1995). Truncating temporal differences: On the efficient implementation of T D(λ) for reinforcement learning, Journal of Artificial Intelligence Research 2: 287-318. 
  4. Crook, P. and Hayes, G. (2003). Learning in a state of confusion: Perceptual aliasing in grid world navigation, Technical Report EDI-INF-RR-0176, University of Edinburgh, Edinburgh. 
  5. Ernst, D., Geurts, P. and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning, Journal of Machine Learning Research 6: 503-556. Zbl1222.68193
  6. Forbes, J. R. N. (2002). Reinforcement Learning for Autonomous Vehicles, Ph.D. thesis, University of California, Berkeley, CA. 
  7. Gelly, S. and Silver, D. (2007). Combining online and offline knowledge in UCT, Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, pp. 273-280. 
  8. Kaelbing, L.P., Litman, M.L. and Moore, A.W. (1996). Reinforcement learning: A survey, Journal of Artificial Intelligence 4(1): 237-285. 
  9. Krawiec, K., Jaśkowski, W.G. and Szubert, M.G. (2011). Evolving small-board Go players using coevolutionary temporal difference learning with archives, International Journal of Applied Mathematics and Computer Science 21(4): 717-731, DOI: 10.2478/v10006-011-0057-3. Zbl1286.91034
  10. Lagoudakis, M. and Parr, R. (2003). Least-squares policy iteration, Journal of Machine Learning Research 4: 1107-1149. Zbl1094.68080
  11. Lanzi, P. (2000). Adaptive agents with reinforcement learning and internal memory, From Animals to Animats 6: Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, USA, pp. 333-342. 
  12. Lin, L.-J. (1993). Reinforcement Learning for Robots Using Neural Networks, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA. 
  13. Markowska-Kaczmar, U. and Kwaśnicka, H. (2005). Neural Networks Applications, Wrocław University of Technology Press, Wrocław, (in Polish). 
  14. Moore, A. and Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less time, Machine Learning 13(1): 103-130, DOI: 10.1007/BF00993104. 
  15. Moriarty, D., Schultz, A. and Grefenstette, J. (1999). Evolutionary algorithms for reinforcement learning, Journal of Artificial Intelligence Research 11: 241-276. Zbl0924.68157
  16. Peng, J. and Williams, R. (1993). Efficient learning and planning within the Dyna framework, Adaptive Behavior 1(4): 437-454. 
  17. Reynolds, S. (2002). Experience stack reinforcement learning for off-policy control, Technical Report CSRP-02-1, University of Birmingham, Birmingham, ftp://ftp.cs.bham.ac.uk/pub/tech-reports/2002/CSRP-02-01.ps.gz. 
  18. Riedmiller, M. (2005). Neural reinforcement learning to swing-up and balance a real pole, Proceedings of the IEEE 2005 International Conference on Systems, Man and Cybernetics, Big Island, HI, USA, pp. 3191-3196. 
  19. Rummery, G. and Niranjan, M. (1994). On-line q-learning using connectionist systems, Technical Report CUED/FINFENG/TR 166, Cambridge University, Cambridge. 
  20. Smart, W. and Kaelbing, L. (2002). Effective reinforcement learning for mobile robots, Proceedings of the International Conference on Robotics and Automation, Washington, DC, USA, pp. 3404-3410. 
  21. Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Proceedings of the Seventh International Conference on Machine Learning, Austin, TX, USA, pp. 216-224. 
  22. Sutton, R. (1991). Planning by incremental dynamic programming, Proceedings of the 8th International Workshop on Machine Learning, Evanston, IL, USA, pp. 353-357. 
  23. Sutton, R. and Barto, A. (1998). Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA. 
  24. Vanhulsel, M., Janssens, D. and Vanhoof, K. (2009). Simulation of sequential data: An enhanced reinforcement learning approach, Expert Systems with Applications 36(4): 8032-8039. 
  25. Watkins, C. (1989). Learning from Delayed Rewards, Ph.D. thesis, Cambridge University, Cambridge. 
  26. Whiteson, S. (2012). Evolutionary computation for reinforcement learning, in M. Wiering and M. van Otterlo (Eds.), Reinforcement Learning: State of the Art, Springer, Berlin, pp. 325-358. 
  27. Whiteson, S. and Stone, P. (2006). Evolutionary function approximation for reinforcement learning, Journal of Machine Learning Research 7: 877-917. Zbl1222.68330
  28. Ye, C., Young, N.H.C. and Wang, D. (2003). A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 33(1): 17-27. 
  29. Zajdel, R. (2012). Fuzzy epoch-incremental reinforcement learning algorithm, in L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, L.A. Zadeh and J.M. Zurada (Eds.), Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, Vol. 7267, Springer-Verlag, Berlin/Heidelberg, pp. 359-366. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.