On Constructing Static Evaluation Function using Temporal Difference Learning

Samuel Choi Ping Man

doi:10.18495/comengapp.v2i1.18

[1] A. Samuel, “Some studies in machine learning using the game of checkers”, IBM J. of Research and Development, 3, 210-229, 1959.
[2] A. Samuel, “Some studies in machine learning using the game of checkers, II - recent progress”, IBM J. of Research and Development, 11, 601-617, 1967.
[3] K.F. Lee, & S. Mahajan, “A pattern classification approach to evaluation function learning”, Artificial Intelligence, 36, 1-25, 1988.
[4] G. Tesauro, “ Neurogammon: a neural network backgammon program”, IJCNN Proceedings, III, 33-39, 1990.
[5] G. Tesauro, “Practical issues in temporal difference learning”, Machine Learning 8, 257-278, 1992.
[6] M. A. Wiering, “Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning”, J. Intelligent Learning Systems & Applications, 2010, 2: 57-68.
[7] N.N.Schraudolph, P. Dayan, T. J. Sejnowski, “Temporal difference learning of position evaluation in the game of Go”, Advances in Neural Information Processing 6, 1994.
[8] S. Gelly and D. Silver, (2011). Monte-Carlo tree search and rapid action value estimation in computer Go. Artificial Intelligence, 175, 1856–1875.
[9] D. Silver, R. S. Sutton, M. Müller, “Temporal-difference search in computer Go”, Machine Learning (2012) 87:183–219.
[10] R.S. Sutton, “Learning to predict by the methods of temporal difference”, Machine Learning, 3, 9-44, 1988.
[11] D.N.L. Levy and D.F. Beal, “Heuristic programming in artificial intelligence 2” , the Second Olympiad, Ellis Horwood, 1990.
[12] C.J.C.H., Watkins, “Learning from Delayed Rewards”, Ph.D. thesis, Cambridge University, 1989.

Abstract viewed - 908 times
PDF downloaded - 1102 times

Affiliations

Samuel Choi Ping Man
Affiliation not stated

Content Top

On Constructing Static Evaluation Function using Temporal Difference Learning

Samuel Choi Ping Man

Vol 2 No 1 (2013)

DOI 10.18495/comengapp.v2i1.18

Submitted: Oct 13, 2013

Published: Feb 25, 2013

Abstract

Programming computers to play board games against human players has long been used as a measure for the development of artificial intelligence. The standard approach for computer game playing is to search for the best move from a given game state by using minimax search with static evaluation function. The static evaluation function is critical to the game playing performance but its design often relies on human expert players. This paper discusses how temporal differences (TD) learning can be used to construct a static evaluation function through self-playing and evaluates the effects for various parameter settings. The game of Kalah, a non-chance game of moderate complexity, is chosen as a testbed. The empirical result shows that TD learning is particularly promising for constructing a good evaluation function for the end games and can substantially improve the overall game playing performance in learning the entire game.DOI:Â 10.18495/comengapp.21.175184