Abstract

Programming computers to play board games against human players has long been used as a measure for the development of artificial intelligence. The standard approach for computer game playing is to search for the best move from a given game state by using minimax search with static evaluation function. The static evaluation function is critical to the game playing performance but its design often relies on human expert players. This paper discusses how temporal differences (TD) learning can be used to construct a static evaluation function through self-playing and evaluates the effects for various parameter settings. The game of Kalah, a non-chance game of moderate complexity, is chosen as a testbed. The empirical result shows that TD learning is particularly promising for constructing a good evaluation function for the end games and can substantially improve the overall game playing performance in learning the entire game.DOI: 10.18495/comengapp.21.175184