Evolving small-board Go players using coevolutionary temporal difference learning with archives

Krzysztof Krawiec; Wojciech Jaśkowski; Marcin Szubert

International Journal of Applied Mathematics and Computer Science (2011)

  • Volume: 21, Issue: 4, page 717-731
  • ISSN: 1641-876X

Abstract

top
We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interweaves two search processes that operate in the intra-game and inter-game mode. Intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between its values for consecutively visited game states. For the inter-game learning component, we provide a coevolutionary algorithm that maintains a sample of strategies and uses the outcomes of games played between them to iteratively modify the probability distribution, according to which new strategies are generated and added to the sample. We analyze CTDL's sensitivity to all important parameters, including the trace decay constant that controls the lookahead horizon of TDL, and the relative intensity of intra-game and inter-game learning. We also investigate how the presence of memory (an archive) affects the search performance, and find out that the archived approach is superior to other techniques considered here and produces strategies that outperform a handcrafted weighted piece counter strategy and simple liberty-based heuristics. This encouraging result can be potentially generalized not only to other strategy representations used for small-board Go, but also to various games and a broader class of problems, because CTDL is generic and does not rely on any problem-specific knowledge.

How to cite

top

Krzysztof Krawiec, Wojciech Jaśkowski, and Marcin Szubert. "Evolving small-board Go players using coevolutionary temporal difference learning with archives." International Journal of Applied Mathematics and Computer Science 21.4 (2011): 717-731. <http://eudml.org/doc/208083>.

@article{KrzysztofKrawiec2011,
abstract = {We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interweaves two search processes that operate in the intra-game and inter-game mode. Intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between its values for consecutively visited game states. For the inter-game learning component, we provide a coevolutionary algorithm that maintains a sample of strategies and uses the outcomes of games played between them to iteratively modify the probability distribution, according to which new strategies are generated and added to the sample. We analyze CTDL's sensitivity to all important parameters, including the trace decay constant that controls the lookahead horizon of TDL, and the relative intensity of intra-game and inter-game learning. We also investigate how the presence of memory (an archive) affects the search performance, and find out that the archived approach is superior to other techniques considered here and produces strategies that outperform a handcrafted weighted piece counter strategy and simple liberty-based heuristics. This encouraging result can be potentially generalized not only to other strategy representations used for small-board Go, but also to various games and a broader class of problems, because CTDL is generic and does not rely on any problem-specific knowledge.},
author = {Krzysztof Krawiec, Wojciech Jaśkowski, Marcin Szubert},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {temporal difference learning; coevolution; small-board Go; exploration vs. exploitation; games; small-board go},
language = {eng},
number = {4},
pages = {717-731},
title = {Evolving small-board Go players using coevolutionary temporal difference learning with archives},
url = {http://eudml.org/doc/208083},
volume = {21},
year = {2011},
}

TY - JOUR
AU - Krzysztof Krawiec
AU - Wojciech Jaśkowski
AU - Marcin Szubert
TI - Evolving small-board Go players using coevolutionary temporal difference learning with archives
JO - International Journal of Applied Mathematics and Computer Science
PY - 2011
VL - 21
IS - 4
SP - 717
EP - 731
AB - We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interweaves two search processes that operate in the intra-game and inter-game mode. Intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between its values for consecutively visited game states. For the inter-game learning component, we provide a coevolutionary algorithm that maintains a sample of strategies and uses the outcomes of games played between them to iteratively modify the probability distribution, according to which new strategies are generated and added to the sample. We analyze CTDL's sensitivity to all important parameters, including the trace decay constant that controls the lookahead horizon of TDL, and the relative intensity of intra-game and inter-game learning. We also investigate how the presence of memory (an archive) affects the search performance, and find out that the archived approach is superior to other techniques considered here and produces strategies that outperform a handcrafted weighted piece counter strategy and simple liberty-based heuristics. This encouraging result can be potentially generalized not only to other strategy representations used for small-board Go, but also to various games and a broader class of problems, because CTDL is generic and does not rely on any problem-specific knowledge.
LA - eng
KW - temporal difference learning; coevolution; small-board Go; exploration vs. exploitation; games; small-board go
UR - http://eudml.org/doc/208083
ER -

References

top
  1. Angeline, P.J. and Pollack, J.B. (1993). Competitive environments evolve better solutions for complex tasks, Proceedings of the 5th International Conference on Genetic Algorithms, Urbana-Champaign, IL, USA, Vol. 270, pp. 264-270. 
  2. Azaria, Y. and Sipper, M. (2005). GP-Gammon: Genetically programming backgammon players, Genetic Programming and Evolvable Machines 6(3): 283-300. 
  3. Bouzy, B. and Cazenave, T. (2001). Computer Go: An AI oriented survey, Artificial Intelligence 132(1): 39-103. Zbl0983.68152
  4. Bozulich, R. (1992). The Go Player's Almanac, Ishi Press, Tokyo. 
  5. Bucci, A. (2007). Emergent Geometric Organization and Informative Dimensions in Coevolutionary Algorithms, Ph.D. thesis, Brandeis University, Waltham, MA. 
  6. Caverlee, J.B. (2000). A genetic algorithm approach to discovering an optimal blackjack strategy, Genetic Algorithms and Genetic Programming at Stanford, Stanford Bookstore, Stanford, CA, pp. 70-79. 
  7. de Jong, E.D. (2005). The MaxSolve algorithm for coevolution, Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, GECCO 2005, Washington, DC, USA, pp. 483-489. 
  8. de Jong, E.D. (2007). A monotonic archive for paretocoevolution, Evolutionary Computation 15(1): 61-93. 
  9. Ficici, S.G. (2004). Solution Concepts in Coevolutionary Algorithms, Ph.D. thesis, Brandeis University, Waltham, MA. Zbl1121.91312
  10. Ficici, S. and Pollack, J. (2003). A game-theoretic memory mechanism for coevolution, Proceedings of the 2003 International Conference on Genetic and Evolutionary Computation, GECCO'03, Chicago, IL, USA, pp. 286-297. Zbl1028.68740
  11. Fogel, D.B. (2002). Blondie24: Playing at the Edge of AI, Morgan Kaufmann Publishers, San Francisco, CA. 
  12. Hauptman, A. and Sipper, M. (2007). Evolution of an efficient search algorithm for the mate-in-n problem in chess, Proceedings of the 10th European Conference on Genetic Programming, EuroGP'07, Valencia, Spain, pp. 78-89. Zbl1150.91007
  13. Jaśkowski, W., Krawiec, K. and Wieloch, B. (2008a). Evolving strategy for a probabilistic game of imperfect information using genetic programming, Genetic Programming and Evolvable Machines 9(4): 281-294. 
  14. Jaśkowski, W., Krawiec, K. and Wieloch, B. (2008b). Winning Ant Wars: Evolving a human-competitive game strategy using fitnessless selection, 11th European Conference on Genetic Programming, EuroGP 2008, Naples, Italy, pp. 13-24. 
  15. Johnson, G. (1997). To test a powerful computer, play an ancient game, The New York Times, July 29. 
  16. Kim, K.-J., Choi, H. and Cho, S.-B. (2007). Hybrid of evolution and reinforcement learning for Othello players, IEEE Symposium on Computational Intelligence and Games, CIG 2007, Honolulu, HI, USA, pp. 203-209. 
  17. Krawiec, K. and Szubert, M. (2010). Coevolutionary temporal difference learning for small-board Go, IEEE Congress on Evolutionary Computation, Barcelona, Spain, pp. 1-8. Zbl1286.91034
  18. Lasker, E. (1960). Go and Go-Moku: The Oriental Board Games, Dover Publications, New York, NY. 
  19. Lubberts, A. and Miikkulainen, R. (2001). Co-evolving a Goplaying neural network, Coevolution: Turning Adaptive Algorithms Upon Themselves, Birds-of-a-Feather Workshop, Genetic and Evolutionary Computation Conference, GECCO 2001, San Francisco, CA, USA, pp. 14-19. 
  20. Lucas, S.M. and Runarsson, T.P. (2006). Temporal difference learning versus co-evolution for acquiring Othello position evaluation, IEEE Symposium on Computational Intelligence and Games, CIG 2006, Reno/Lake Tahoe, NV, USA, pp. 52-59. 
  21. Luke, S. (1998). Genetic programming produced competitive soccer softbot teams for RoboCup97, Genetic Programming 1998: Proceedings of the 3rd Annual Conference, Madison, WI, USA, pp. 214-222. 
  22. Luke, S. (2010). ECJ 20-A Java-based Evolutionary Computation Research System, http://cs.gmu.edu/˜eclab/projects/ecj/. 
  23. Luke, S. and Wiegand, R. (2002). When coevolutionary algorithms exhibit evolutionary dynamics, Workshop on Understanding Coevolution: Theory and Analysis of Coevolutionary Algorithms (at GECCO 2002), New York, NY, USA, pp. 236-241. 
  24. Mayer, H.A. (2007). Board representations for neural Go players learning by temporal difference, IEEE Symposium on Computational Intelligence and Games, CIG 2007, Honolulu, HI, USA, pp. 183-188. 
  25. Mechner, D.A. (1998). All systems Go, The Sciences 38(1): 32-37. 
  26. Michalewicz, Z. (1996). Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, London. Zbl0841.68047
  27. Miconi, T. (2009). Why coevolution doesn't ”work”: Superiority and progress in coevolution, Proceedings of the 12th European Conference on Genetic Programming, EuroGP'09, Tübingen, Germany, pp. 49-60. 
  28. Monroy, G.A., Stanley, K.O. and Miikkulainen, R. (2006). Coevolution of neural networks using a layered Pareto archive, Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, GECCO 2006, Seattle, WA, USA, pp. 329-336. 
  29. Müller, M. (2009). Fuego at the Computer Olympiad in Pamplona 2009: A tournament report, Technical report, University of Alberta, Alberta. 
  30. Pollack, J.B. and Blair, A.D. (1998). Co-evolution in the successful learning of backgammon strategy, Machine Learning 32(3): 225-240. Zbl0913.90289
  31. Rosin, C.D. and Belew, R.K. (1997). New methods for competitive coevolution, Evolutionary Computation 5(1): 1-29. 
  32. Runarsson, T. P. and Lucas, S. (2005). Coevolution versus selfplay temporal difference learning for acquiring position evaluation in small-board Go, IEEE Transactions on Evolutionary Computation 9(6): 628-640. 
  33. Samuel, A.L. (1959). Some studies in machine learning using the game of checkers, IBM Journal of Research and Development 3(3): 210-229. 
  34. Schraudolph, N.N., Dayan, P. and Sejnowski, T.J. (2001). Learning to evaluate Go positions via temporal difference methods, in N. Baba and L.C. Jain (Eds.) Computational Intelligence in Games, Studies in Fuzziness and Soft Computing, Vol. 62, Springer-Verlag, Berlin, Chapter 4, pp. 77-98. 
  35. Silver, D., Sutton, R. and Müller, M. (2007). Reinforcement learning of local shape in the game of Go, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, pp. 1053-1058. 
  36. Singer, J.A. (2001). Co-evolving a neural-net evaluation function for Othello by combining genetic algorithms and reinforcement learning, International Conference on Computational Science, San Francisco, CA, USA, pp. 377-389. Zbl0983.68753
  37. Stanley, K., Bryant, B. and Miikkulainen, R. (2005). Real-time neuroevolution in the NERO video game, IEEE Transactions on Evolutionary Computation 9(6): 653-668. 
  38. Sutton, R.S. (1988). Learning to predict by the methods of temporal differences, Machine Learning 3(1): 9-44. 
  39. Sutton, R.S. and Barto, A.G. (1998). Reinforcement Learning: An Introduction, The MIT Press, Cambridge, MA. 
  40. Szubert, M. (2010). cECJ-Coevolutionary Computation in Java, http://www.cs.put.poznan.pl/mszubert/projects/cecj.html. 
  41. Szubert, M., Jaśkowski, W. and Krawiec, K. (2009). Coevolutionary temporal difference learning for Othello, IEEE Symposium on Computational Intelligence and Games, CIG 2009, Milan, Italy, pp. 104-111. Zbl1318.68167
  42. Tesauro, G. (1995). Temporal difference learning and TDGammon, Communications of the ACM 38(3): 58-68. 
  43. Watson, R.A. and Pollack, J.B. (2001). Coevolutionary dynamics in a minimal substrate, Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2001, San Francisco, CA, USA, pp. 702-709. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.