Reinforcement Learning with Open Spiel (Part 1)
There is a certain game at my parent’s home that I really like. It is similar in spirit to the game of Connect Four, but in 3D. Instead of inserting tokens in 7 columns, a \(4 \times 4 \times 4\) grid is used. The goal remains the same: align four of your tokens to win, be it in a row, in a diagonal …

Since comming to Japan, I was lucky to find a bar in Kobe where the patron has this game. I played a few games against him, and to my great shame, lost more games than I want to admit. But not to despair, I just remembered I am a computer scientist by trade. Surely I should be able to make a program to play in my stead! Join me on my small adventure into Reinforcement Learning and Python programming.
Discovering the Open Spiel framework
As luck would have it, there is a repository available on GitHub called Open Spiel which contains a fair number of games and machine learning algorithms. Almost the whole project is implemented in C++, but there are Python bindings available and it can be installed a Python package via pip.
By installing the open_spiel library through pip and then downloading the source files of the project, we can use existing Python sample programs to launch a game of Tic-Tac-Toe.
$ git clone https://github.com/google-deepmind/open_spiel.git
$ cd open_spiel
$ pip install -r requirements.txt
$ pip install open_spiel
$ python open_spiel/python/example/example.py --game_string="tic_tac_toe"
Registered games:
[<GameType '2048'>, <GameType 'add_noise'>, <GameType 'amazons'>, <GameType 'backgammon'>, <GameType 'bargaining'>, <GameType 'battleship'>, <GameType 'blackjack'>, <GameType 'blotto'>, <GameType 'breakthrough'>, <GameType 'bridge'>, <GameType 'bridge_uncontested_bidding'>, <GameType 'cached_tree'>, <GameType 'catch'>, <GameType 'chat_game'>, <GameType 'checkers'>, <GameType 'chess'>, <GameType 'cliff_walking'>, <GameType 'clobber'>, <GameType 'coin_game'>, <GameType 'colored_trails'>, <GameType 'connect_four'>, <GameType 'coop_box_pushing'>, <GameType 'coop_to_1p'>, <GameType 'coordinated_mp'>, <GameType 'crazy_eights'>, <GameType 'cursor_go'>, <GameType 'dark_chess'>, <GameType 'dark_hex'>, <GameType 'dark_hex_ir'>, <GameType 'deep_sea'>, <GameType 'dots_and_boxes'>, <GameType 'dou_dizhu'>, <GameType 'efg_game'>, <GameType 'einstein_wurfelt_nicht'>, <GameType 'euchre'>, <GameType 'first_sealed_auction'>, <GameType 'gin_rummy'>, <GameType 'go'>, <GameType 'goofspiel'>, <GameType 'havannah'>, <GameType 'hearts'>, <GameType 'hex'>, <GameType 'kriegspiel'>, <GameType 'kuhn_poker'>, <GameType 'laser_tag'>, <GameType 'leduc_poker'>, <GameType 'lewis_signaling'>, <GameType 'liars_dice'>, <GameType 'liars_dice_ir'>, <GameType 'maedn'>, <GameType 'mancala'>, <GameType 'markov_soccer'>, <GameType 'matching_pennies_3p'>, <GameType 'matrix_bos'>, <GameType 'matrix_brps'>, <GameType 'matrix_cd'>, <GameType 'matrix_coordination'>, <GameType 'matrix_mp'>, <GameType 'matrix_pd'>, <GameType 'matrix_rps'>, <GameType 'matrix_rpsw'>, <GameType 'matrix_sh'>, <GameType 'matrix_shapleys_game'>, <GameType 'mfg_crowd_modelling'>, <GameType 'mfg_crowd_modelling_2d'>, <GameType 'mfg_dynamic_routing'>, <GameType 'mfg_garnet'>, <GameType 'misere'>, <GameType 'mnk'>, <GameType 'morpion_solitaire'>, <GameType 'negotiation'>, <GameType 'nfg_game'>, <GameType 'nim'>, <GameType 'nine_mens_morris'>, <GameType 'normal_form_extensive_game'>, <GameType 'oh_hell'>, <GameType 'oshi_zumo'>, <GameType 'othello'>, <GameType 'oware'>, <GameType 'pathfinding'>, <GameType 'pentago'>, <GameType 'phantom_go'>, <GameType 'phantom_ttt'>, <GameType 'phantom_ttt_ir'>, <GameType 'pig'>, <GameType 'python_block_dominoes'>, <GameType 'python_connect_four_3d'>, <GameType 'python_dynamic_routing'>, <GameType 'python_iterated_prisoners_dilemma'>, <GameType 'python_kuhn_poker'>, <GameType 'python_liars_poker'>, <GameType 'python_team_dominoes'>, <GameType 'python_tic_tac_toe'>, <GameType 'quoridor'>, <GameType 'rbc'>, <GameType 'repeated_game'>, <GameType 'restricted_nash_response'>, <GameType 'sheriff'>, <GameType 'skat'>, <GameType 'solitaire'>, <GameType 'spades'>, <GameType 'start_at'>, <GameType 'stones_and_gems'>, <GameType 'tarok'>, <GameType 'tic_tac_toe'>, <GameType 'tiny_bridge_2p'>, <GameType 'tiny_bridge_4p'>, <GameType 'tiny_hanabi'>, <GameType 'trade_comm'>, <GameType 'turn_based_simultaneous_game'>, <GameType 'twixt'>, <GameType 'ultimate_tic_tac_toe'>, <GameType 'y'>, <GameType 'zerosum'>]
Creating game: tic_tac_toe
...
...
...
Player 0 , randomly sampled action: x(0,0)
x..
...
...
Player 1 , randomly sampled action: o(2,1)
x..
...
.o.
Player 0 , randomly sampled action: x(2,0)
x..
...
xo.
Player 1 , randomly sampled action: o(1,2)
x..
..o
xo.
Player 0 , randomly sampled action: x(2,2)
x..
..o
xox
Player 1 , randomly sampled action: o(0,2)
x.o
..o
xox
Player 0 , randomly sampled action: x(1,0)
x.o
x.o
xox
Utility for player 0 is 1.0
Utility for player 1 is -1.0
In this example.py
program, agents that pick an action at random are initialized for each of the players and a game is played. The Utility shown on the bottom two lines indicates who won the game:
- a value of \(1.0\) means that player won,
- a value of \(-1.0\) means that player lost,
- a value of \(0.0\) for both players indicate a draw.
In the game above, the first player (Player 0) managed to align their token in the first column, winning the game.
As shown at the beginning of the program, there are many games implemented in Open Spiel. You can launch a random playthrough of any of the implemented games, just by setting the --game_string
flag when launching the program. This also means that if I am able to implement my game according to Open Spiel’s API, I should be able to reuse all the provided example programs with my own game.
Installing Open Spiel locally
Since I want to implement my own game into the framework and that there appears to be a large number of generic programs already integrated into the framework, I decided to add my own program to the library. This means that I need to compile and install Open Spiel via the sources, and not via the package available for install by pip.
$ ./install.sh # Installs the packages necessary to build the C++ Open Spiel
$ pip install -r requirements.txt # Installs the necessary Python packages
$ ./open_spiel/scripts/build_and_run_tests.sh # Builds Open Spiel with all the games and runs the tests
$ pip install . # Installs the compiled Open Spiel Python bindings
When developing my program, I realized I needed to uninstall and reinstall the open_spiel using pip every time I made changes to my program. Otherwise, the “previous” version of the Python programs is kept when launching programs. This is done using the following command:
$ pip uninstall open_spiel
$ ./open_spiel/scripts/build_and_run_tests.sh
$ pip install .
Implementing the game
As luck would have it, there are a number of games implemented in Python. So I was able to take inspiration from these. The 3D Connect Four game is very similar to Tic-Tac-Toe. Both of these games are sequential, two-players, perfect information, zero-sum, deterministic games. This means that the two players play one after the other, there is no hidden information (unlike poker where the hand of your opponents is unknown to you), one player’s gain is the other player’s loss, and there is no chance involved in the outcome of a game (no throw of dices).
This information is important because it is needed by the Open Spiel framework to run your game. Through the Python API, this information is defined in pyspiel.GameType
and pyspiel.GameInfo
as shown below.
_NUM_PLAYERS = 2
_BOARD_FILES = 4
_BOARD_HEIGHT = 4
_BOARD_RANKS = 4
_MAX_GAME_LENGTH = _BOARD_FILES * _BOARD_RANKS * _BOARD_HEIGHT
_NB_POSSIBLE_ACTIONS = _BOARD_FILES * _BOARD_RANKS
_GAME_TYPE = pyspiel.GameType(
short_name="python_connect_four_3d",
long_name="Python Connect Four 3D",
dynamics=pyspiel.GameType.Dynamics.SEQUENTIAL,
chance_mode=pyspiel.GameType.ChanceMode.DETERMINISTIC,
information=pyspiel.GameType.Information.PERFECT_INFORMATION,
utility=pyspiel.GameType.Utility.ZERO_SUM,
reward_model=pyspiel.GameType.RewardModel.TERMINAL,
max_num_players=_NUM_PLAYERS,
min_num_players=_NUM_PLAYERS,
provides_information_state_string=True,
provides_information_state_tensor=True,
provides_observation_string=True,
provides_observation_tensor=True,
provides_factored_observation_string=True)
_GAME_INFO = pyspiel.GameInfo(
num_distinct_actions=_NB_POSSIBLE_ACTIONS,
max_chance_outcomes=0,
num_players=_NUM_PLAYERS,
min_utility=-1.0,
max_utility=1.0,
utility_sum=0.0,
max_game_length= _MAX_GAME_LENGTH) # If all the pieces are placed and a draw is reached
Game
and State
for the core game mechanisms
The two main classes that you need to extend and implement are pyspiel.Game
and pyspiel.State
. The former is essentially a factory class used to create Games. The latter is the class that actually retains all the information about the current playthrough, including who is the next player, what the state of the board is etc.
Game
implementation
Implementing the Game
class is rather straightforward. The constructor merely carries the game information contained in _GAME_TYPE
to the parent class. Method new_initial_state
creates a clean instance of the State
class implementing the game, while method make_py_observer
provides a view of the current game state using the Connect4_3D_Observer
class.
class Connect4_3D_Game(pyspiel.Game):
"""A Python version of 3D connect 4."""
def __init__(self, params=None):
super().__init__(_GAME_TYPE,_GAME_INFO, params or dict())
def new_initial_state(self):
"""Returns a state corresponding to the start of a game."""
return Connect4_3D_State(self)
def make_py_observer(self, iig_obs_type=None, params=None):
"""Returns an object used for observing game state."""
if ((iig_obs_type is None) or \
(iig_obs_type.public_info and not iig_obs_type.perfect_recall)):
return Connect4_3D_Observer(params)
else:
return IIGObserverForPublicInfoGame(iig_obs_type, params)
State
implementation
As stated above, the class which actually implements all the game mechanics is the one that extends State
. Some notable methods that need to be implemented include:
-
current_player
which returns the id of the player who needs to make an action next
For the two-player game we are dealing with, the value alternates betwenn \(0\) and \(1\) -
_legal_actions
which defines which moves can be made by the current player
In our case, there are 16 different moves. But when a pile of token is full, that move can no longer be played and is removed from the array returned by this method. -
_apply_action
which updates the state of the game based on the move chosen by the player
This method updates the game board, checks the victory and end-game condition, and updates the current player -
__str__
converts to state of the board to a string for easy display
Although this method does not intervene in the game mechanics, it is needed when using existing programs from the Open Spiel examples that will display the current state of the board to the terminal. -
is_terminal
indicates if the game if over (either a player won or victory was reached) -
returns
when the game is over, indicates the payout to each player -
_action_to_string
converts the action picked by a player to an easy to read string
Players pick a value in the range \(\left[ 0, 15 \right]\) to designate the pole onto which to place their token. This translates to(row, file, rank)
coordinates on the board (the row is the first empty slot for the specified file and rank). I implemented it such that it shows the action number and the coordinates in one string:action(row,file,rank)
.
I will not go into further details in the implementation. The game itself is rather straightforward to implement following the example of the Python implementation of Tic-Tac-Toe. Suffice to say the most difficult part of the program is the check for victory conditions inside method apply_action
. While not technically challenging, it is very easy to make a silly mistake here.
This is a problem because if the game is not correctly programmed, an AI will learn to play the “broken” game and make plays that do not make sense. We therefore need to test our game thoroughly …
Observer
class
The last class that needs to be implemented is the “observer.” It is used to provide the observable information to the agents (the AIs) that are going to play the game. To be honest I do not fully understand what is been done here. I simply adapted the code from Tic-Tac-Toe to use a 3D array instead of a 2D array.
class Connect4_3D_Observer:
"""An observer that conforms to the PyObserver interface (observation.py)."""
def __init__(self, params):
"""Initializes an empty observation tensor."""
if params:
raise ValueError(f"Observation parameters not supported; passed {params}")
# The observation should contain a 1-D tensor in `self.tensor` and a
# dictionary of views onto the tensor, which may be of any shape.
# Here the observation is indexed `(cell state, row, column)`.
shape = (1+ _NUM_PLAYERS, _BOARD_RANKS, _BOARD_FILES, _BOARD_HEIGHT)
self.tensor = np.zeros(np.prod(shape), np.float32)
self.dict = {"observation": np.reshape(self.tensor, shape)}
# TODO not certain the shape of the tensor is appropriate (why 3 times the board cells ?)
def set_from(self, state, player):
"""Updates the `tensor` and `dict` to reflect the `state` from the PoV of `player`."""
del player # Why? I guess the player is not needed in the implementation
obs = self.dict["observation"]
obs.fill(0)
for row in range(_BOARD_HEIGHT):
for file in range(_BOARD_FILES):
for rank in range(_BOARD_RANKS):
cell_state = ".ox".index(state.board[row, file, rank]) # pick the corresponding character
obs[cell_state, row, file, rank] = 1 # this I don't understand
def string_from(self, state, player):
"""Observation of `state` from the point of view of `player`, as a string."""
del player
return _board_to_string(state.board)
def _board_to_string(board):
"""Returns a string representation of the board."""
slices = []
b = np.flip(board,0)
for i in range(b.shape[0]):
for j in range(b.shape[1]):
for k in range(b.shape[2]):
slices.append(f"{b[i, j, k]}")
slices.append("\n")
slices.append("\n")
return "".join(slices)
Game registration into Open Spiel
Finally, we need to register the newly implemented game into the list of games provided with Open Spiel so that we are able to launch it using its identifying string. This is done using the following line at the bottom of the program file connect_four_3d.py
.
pyspiel.register_game(_GAME_TYPE, Connect4_3D_Game)
In file __init__.py
the following import
needs to be added:
diff --git a/open_spiel/python/games/__init__.py b/open_spiel/python/games/__init__.py
index 33945e06..a2d42e5b 100644
--- a/open_spiel/python/games/__init__.py
+++ b/open_spiel/python/games/__init__.py
@@ -28,6 +28,7 @@ pyspiel.register_game(_GAME_TYPE, KuhnPokerGame)
from open_spiel.python.games import block_dominoes
from open_spiel.python.games import chat_game
+from open_spiel.python.games import connect_four_3d
from open_spiel.python.games import dynamic_routing
from open_spiel.python.games import iterated_prisoners_dilemma
from open_spiel.python.games import kuhn_poker
Now, any program which includes from open_spiel.python import games
will know about our new game and we should be able to tell the programs to load our new game by giving it the appropriate arguments.
Testing
First playthroughs
Now that the game is implemented, we can play it through the Open Spiel framework. Random Agent vs Random Agent games can be played using the open_spiel/python/examples/example.py
program mentionned above.
Here is the command and a randomly generated playthrough:
$ python open_spiel/python/examples/example.py --game_string="python_connect_four_3d"
Registered games:
[<GameType '2048'>, <GameType 'add_noise'>, <GameType 'amazons'>, <GameType 'backgammon'>, <GameType 'bargaining'>, <GameType 'battleship'>, <GameType 'blackjack'>, <GameType 'blotto'>, <GameType 'breakthrough'>, <GameType 'bridge'>, <GameType 'bridge_uncontested_bidding'>, <GameType 'cached_tree'>, <GameType 'catch'>, <GameType 'chat_game'>, <GameType 'checkers'>, <GameType 'chess'>, <GameType 'cliff_walking'>, <GameType 'clobber'>, <GameType 'coin_game'>, <GameType 'colored_trails'>, <GameType 'connect_four'>, <GameType 'coop_box_pushing'>, <GameType 'coop_to_1p'>, <GameType 'coordinated_mp'>, <GameType 'crazy_eights'>, <GameType 'cursor_go'>, <GameType 'dark_chess'>, <GameType 'dark_hex'>, <GameType 'dark_hex_ir'>, <GameType 'deep_sea'>, <GameType 'dots_and_boxes'>, <GameType 'dou_dizhu'>, <GameType 'efg_game'>, <GameType 'einstein_wurfelt_nicht'>, <GameType 'euchre'>, <GameType 'first_sealed_auction'>, <GameType 'gin_rummy'>, <GameType 'go'>, <GameType 'goofspiel'>, <GameType 'havannah'>, <GameType 'hearts'>, <GameType 'hex'>, <GameType 'kriegspiel'>, <GameType 'kuhn_poker'>, <GameType 'laser_tag'>, <GameType 'leduc_poker'>, <GameType 'lewis_signaling'>, <GameType 'liars_dice'>, <GameType 'liars_dice_ir'>, <GameType 'maedn'>, <GameType 'mancala'>, <GameType 'markov_soccer'>, <GameType 'matching_pennies_3p'>, <GameType 'matrix_bos'>, <GameType 'matrix_brps'>, <GameType 'matrix_cd'>, <GameType 'matrix_coordination'>, <GameType 'matrix_mp'>, <GameType 'matrix_pd'>, <GameType 'matrix_rps'>, <GameType 'matrix_rpsw'>, <GameType 'matrix_sh'>, <GameType 'matrix_shapleys_game'>, <GameType 'mfg_crowd_modelling'>, <GameType 'mfg_crowd_modelling_2d'>, <GameType 'mfg_dynamic_routing'>, <GameType 'mfg_garnet'>, <GameType 'misere'>, <GameType 'mnk'>, <GameType 'morpion_solitaire'>, <GameType 'negotiation'>, <GameType 'nfg_game'>, <GameType 'nim'>, <GameType 'nine_mens_morris'>, <GameType 'normal_form_extensive_game'>, <GameType 'oh_hell'>, <GameType 'oshi_zumo'>, <GameType 'othello'>, <GameType 'oware'>, <GameType 'pathfinding'>, <GameType 'pentago'>, <GameType 'phantom_go'>, <GameType 'phantom_ttt'>, <GameType 'phantom_ttt_ir'>, <GameType 'pig'>, <GameType 'python_block_dominoes'>, <GameType 'python_connect_four_3d'>, <GameType 'python_dynamic_routing'>, <GameType 'python_iterated_prisoners_dilemma'>, <GameType 'python_kuhn_poker'>, <GameType 'python_liars_poker'>, <GameType 'python_team_dominoes'>, <GameType 'python_tic_tac_toe'>, <GameType 'quoridor'>, <GameType 'rbc'>, <GameType 'repeated_game'>, <GameType 'restricted_nash_response'>, <GameType 'sheriff'>, <GameType 'skat'>, <GameType 'solitaire'>, <GameType 'spades'>, <GameType 'start_at'>, <GameType 'stones_and_gems'>, <GameType 'tarok'>, <GameType 'tic_tac_toe'>, <GameType 'tiny_bridge_2p'>, <GameType 'tiny_bridge_4p'>, <GameType 'tiny_hanabi'>, <GameType 'trade_comm'>, <GameType 'turn_based_simultaneous_game'>, <GameType 'twixt'>, <GameType 'ultimate_tic_tac_toe'>, <GameType 'y'>, <GameType 'zerosum'>]
Creating game: python_connect_four_3d
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
Player 0 , randomly sampled action: 7(0,1,3)
....
....
....
....
....
....
....
....
....
....
....
....
....
...o
....
....
Player 1 , randomly sampled action: 8(0,2,0)
....
....
....
....
....
....
....
....
....
....
....
....
....
...o
x...
....
<-- a number of plays removed for brevity -->
Player 0 , randomly sampled action: 5(2,1,1)
o...
....
....
....
x...
.o..
....
....
o..x
.x..
oo..
o...
x.ox
xx.o
xxx.
oo.o
Player 1 , randomly sampled action: 11(0,2,3)
o...
....
....
....
x...
.o..
....
....
o..x
.x..
oo..
o...
x.ox
xx.o
xxxx
oo.o
Utility for player 0 is -1.0
Utility for player 1 is 1.0
After many iteration, Player 1 was able to align four tokens on the bottom row, winning the game!
It turns out that by default, most of the programs provided in the open_spiel/python/examples
directory do not have the import
directive that would include the games implemented in Python. To be able to launch my newly implemented game, I had to add the following import to the example program open_spiel/python/examples/mcts.py
.
+ from open_spiel.python import games
from open_spiel.python.algorithms import mcts
To launch a game of Connect Four 3D, I can then use the following command. Note that the game
flag is set to python_connect_four_3d
, matching the value for the "short_name"
in the GameType
definition.
python open_spiel/python/examples.mcts.py --game="python_connect_four_3d" --player1="human" --player2="random"
It is also possible to flip the players and make the human play second:
python open_spiel/python/examples.mcts.py --game="python_connect_four_3d" --player1="random" --player2="human"
Either way, winning against an agent that plays at random should not be too difficult.
Testing the victory conditions
If you built the Open Spiel from source, you will have noticed that it contains a fairly large number of tests. Since it is not humanly feasible to test every victory situation (the part of the implementation I am most worried about), I decided to follow the example and implement a few basic tests.
The tests I implemented can be found in connect_four_3d_test.py
. They are essentially inspired by the test suite for the Tic-Tac-Toe game.
I also tried to implement almost all possible winning combinations for the game. This is not a thorough test suite (in all of my tests Player 0 is the winning player), but this gives me some confidence that the victory detection code is implemented correctly.
Summary
We implemented a game in Python into the Open Spiel framework, tested it, and we are now able to run it using the example main programs provided with the library. The next step will be to start training some AI on this game, but that will have to wait for a follow-up …
External links
Enjoy Reading This Article?
Here are some more articles you might like to read next: