# Epoch-incremental reinforcement learning algorithms

International Journal of Applied Mathematics and Computer Science (2013)

- Volume: 23, Issue: 3, page 623-635
- ISSN: 1641-876X

## Access Full Article

top## Abstract

top## How to cite

topRoman Zajdel. "Epoch-incremental reinforcement learning algorithms." International Journal of Applied Mathematics and Computer Science 23.3 (2013): 623-635. <http://eudml.org/doc/262322>.

@article{RomanZajdel2013,

abstract = {In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.},

author = {Roman Zajdel},

journal = {International Journal of Applied Mathematics and Computer Science},

keywords = {reinforcement learning; epoch-incremental algorithm; grid world},

language = {eng},

number = {3},

pages = {623-635},

title = {Epoch-incremental reinforcement learning algorithms},

url = {http://eudml.org/doc/262322},

volume = {23},

year = {2013},

}

TY - JOUR

AU - Roman Zajdel

TI - Epoch-incremental reinforcement learning algorithms

JO - International Journal of Applied Mathematics and Computer Science

PY - 2013

VL - 23

IS - 3

SP - 623

EP - 635

AB - In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.

LA - eng

KW - reinforcement learning; epoch-incremental algorithm; grid world

UR - http://eudml.org/doc/262322

ER -

## References

top- Atiya, A.F., Parlos, A.G. and Ingber, L. (2003). A reinforcement learning method based on adaptive simulated annealing, Proceedings of the 46th International Midwest Symposium on Circuits and Systems, Cairo, Egypt, pp. 121-124.
- Barto, A., Sutton, R. and Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning problem, IEEE Transactions on Systems, Man, and Cybernetics 13(5): 834-847.
- Cichosz, P. (1995). Truncating temporal differences: On the efficient implementation of T D(λ) for reinforcement learning, Journal of Artificial Intelligence Research 2: 287-318.
- Crook, P. and Hayes, G. (2003). Learning in a state of confusion: Perceptual aliasing in grid world navigation, Technical Report EDI-INF-RR-0176, University of Edinburgh, Edinburgh.
- Ernst, D., Geurts, P. and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning, Journal of Machine Learning Research 6: 503-556. Zbl1222.68193
- Forbes, J. R. N. (2002). Reinforcement Learning for Autonomous Vehicles, Ph.D. thesis, University of California, Berkeley, CA.
- Gelly, S. and Silver, D. (2007). Combining online and offline knowledge in UCT, Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, pp. 273-280.
- Kaelbing, L.P., Litman, M.L. and Moore, A.W. (1996). Reinforcement learning: A survey, Journal of Artificial Intelligence 4(1): 237-285.
- Krawiec, K., Jaśkowski, W.G. and Szubert, M.G. (2011). Evolving small-board Go players using coevolutionary temporal difference learning with archives, International Journal of Applied Mathematics and Computer Science 21(4): 717-731, DOI: 10.2478/v10006-011-0057-3. Zbl1286.91034
- Lagoudakis, M. and Parr, R. (2003). Least-squares policy iteration, Journal of Machine Learning Research 4: 1107-1149. Zbl1094.68080
- Lanzi, P. (2000). Adaptive agents with reinforcement learning and internal memory, From Animals to Animats 6: Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, USA, pp. 333-342.
- Lin, L.-J. (1993). Reinforcement Learning for Robots Using Neural Networks, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.
- Markowska-Kaczmar, U. and Kwaśnicka, H. (2005). Neural Networks Applications, Wrocław University of Technology Press, Wrocław, (in Polish).
- Moore, A. and Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less time, Machine Learning 13(1): 103-130, DOI: 10.1007/BF00993104.
- Moriarty, D., Schultz, A. and Grefenstette, J. (1999). Evolutionary algorithms for reinforcement learning, Journal of Artificial Intelligence Research 11: 241-276. Zbl0924.68157
- Peng, J. and Williams, R. (1993). Efficient learning and planning within the Dyna framework, Adaptive Behavior 1(4): 437-454.
- Reynolds, S. (2002). Experience stack reinforcement learning for off-policy control, Technical Report CSRP-02-1, University of Birmingham, Birmingham, ftp://ftp.cs.bham.ac.uk/pub/tech-reports/2002/CSRP-02-01.ps.gz.
- Riedmiller, M. (2005). Neural reinforcement learning to swing-up and balance a real pole, Proceedings of the IEEE 2005 International Conference on Systems, Man and Cybernetics, Big Island, HI, USA, pp. 3191-3196.
- Rummery, G. and Niranjan, M. (1994). On-line q-learning using connectionist systems, Technical Report CUED/FINFENG/TR 166, Cambridge University, Cambridge.
- Smart, W. and Kaelbing, L. (2002). Effective reinforcement learning for mobile robots, Proceedings of the International Conference on Robotics and Automation, Washington, DC, USA, pp. 3404-3410.
- Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Proceedings of the Seventh International Conference on Machine Learning, Austin, TX, USA, pp. 216-224.
- Sutton, R. (1991). Planning by incremental dynamic programming, Proceedings of the 8th International Workshop on Machine Learning, Evanston, IL, USA, pp. 353-357.
- Sutton, R. and Barto, A. (1998). Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA.
- Vanhulsel, M., Janssens, D. and Vanhoof, K. (2009). Simulation of sequential data: An enhanced reinforcement learning approach, Expert Systems with Applications 36(4): 8032-8039.
- Watkins, C. (1989). Learning from Delayed Rewards, Ph.D. thesis, Cambridge University, Cambridge.
- Whiteson, S. (2012). Evolutionary computation for reinforcement learning, in M. Wiering and M. van Otterlo (Eds.), Reinforcement Learning: State of the Art, Springer, Berlin, pp. 325-358.
- Whiteson, S. and Stone, P. (2006). Evolutionary function approximation for reinforcement learning, Journal of Machine Learning Research 7: 877-917. Zbl1222.68330
- Ye, C., Young, N.H.C. and Wang, D. (2003). A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 33(1): 17-27.
- Zajdel, R. (2012). Fuzzy epoch-incremental reinforcement learning algorithm, in L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, L.A. Zadeh and J.M. Zurada (Eds.), Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, Vol. 7267, Springer-Verlag, Berlin/Heidelberg, pp. 359-366.

## Citations in EuDML Documents

top## NotesEmbed ?

topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.